Company
Slack
Title
Building Secure and Private Enterprise Search with LLMs
Industry
Tech
Year
2025
Summary (short)
Slack built an enterprise search feature that extends their AI-powered search capabilities to external sources like Google Drive and GitHub while maintaining strict security and privacy standards. The problem was enabling users to search across multiple knowledge sources without compromising data security or violating privacy principles. Their solution uses a federated, real-time approach with OAuth-based authentication, Retrieval Augmented Generation (RAG), and LLMs hosted in an AWS escrow VPC to ensure customer data never leaves Slack's trust boundary, isn't used for model training, and respects user permissions. The result is a production system that surfaces relevant, up-to-date, permissioned content from both internal and external sources while maintaining enterprise-grade security standards, with explicit user and admin control over data access.
## Overview Slack's enterprise search case study describes their production implementation of LLM-powered search capabilities that extend beyond Slack's internal communications to include external knowledge sources. The use case addresses a common enterprise challenge: enabling employees to find relevant information across multiple systems while maintaining strict security, privacy, and compliance requirements. Slack approaches this as an extension of their existing Slack AI product, which already provides AI-powered summaries and search within Slack itself. The new capability integrates with Google Drive and GitHub initially, with plans to expand to additional applications. The company frames this within their security-first philosophy, claiming adherence to principles including never storing customer data outside their trust boundary, never training LLMs on customer data, ensuring AI only accesses data users can already see, and integrating with existing compliance infrastructure. While these are marketing claims, the technical implementation details they provide suggest a thoughtful architectural approach to these concerns. ## Technical Architecture and LLMOps Implementation The core LLMOps architecture centers on several key technical decisions that address the production deployment challenges of running LLMs at scale while maintaining security guarantees. Slack uses closed-source LLMs hosted on AWS infrastructure within what they term an "escrow VPC." This architectural pattern is designed to create a trust boundary where the model provider (presumably a third-party LLM vendor) never gains access to Slack customer data. The VPC acts as an isolation layer, ensuring that data processed by the LLM remains within Slack's controlled infrastructure even though the models themselves may come from external vendors. The choice of Retrieval Augmented Generation (RAG) as the primary approach for incorporating enterprise knowledge is central to their LLMOps strategy. Rather than fine-tuning or training models on customer data—which would create data retention, privacy, and compliance challenges—they supply the LLM with contextually relevant information only at inference time. This runtime approach means that no customer data persists in the model weights, and the LLM processes information transiently to generate responses. From an LLMOps perspective, this is a more operationally manageable approach as it avoids the complexity of managing custom model versions, retraining pipelines, and the associated data governance challenges. ## Federated Search and Real-Time Data Access A particularly noteworthy architectural decision is Slack's choice to implement federated search rather than indexing external data sources within their own databases. When a user performs a search that includes external sources, Slack makes real-time API calls to those external systems (Google Drive, GitHub, etc.) to retrieve results. This has significant implications for their LLMOps implementation. By not storing external data, they eliminate entire classes of operational concerns including data staleness, synchronization complexity, storage costs, and compliance risks associated with data duplication. However, this real-time approach introduces its own operational challenges. The system must handle latency from external API calls, deal with rate limiting and availability issues from third-party services, and manage the complexity of orchestrating multiple concurrent requests. While the text doesn't explicitly detail how they handle these challenges, implementing such a system in production would require sophisticated error handling, caching strategies (they mention the Slack client may cache data between reloads), circuit breakers, and fallback mechanisms to ensure acceptable user experience even when external services are slow or unavailable. ## Permissions and Access Control The permissions model represents a critical component of their LLMOps implementation, as incorrect handling could lead to serious security breaches. Slack leverages OAuth as the authorization mechanism, requiring both organization-level admin approval and individual user authorization before integrating external sources. This two-level consent model provides defense in depth while giving users explicit control over what data Slack can access on their behalf. From an LLMOps perspective, this permissions architecture means that every search query must navigate a complex authorization flow. The system must validate OAuth tokens, ensure they're still valid and haven't been revoked, make permission-aware API calls to external systems, and then filter the results to ensure the RAG pipeline only provides the LLM with content the user is authorized to see. The Access Control List (ACL) enforcement happens at multiple layers: at the external system level (via OAuth scopes), at the API call level (the external system returns only permissioned results), and potentially at the RAG retrieval level within Slack's infrastructure. The principle of least privilege is applied to OAuth scope requests—Slack claims to only request read scopes necessary for search functionality. This minimizes the potential damage if credentials were compromised, though it also means the system cannot perform more sophisticated integrations that might require write access or broader permissions. ## RAG Pipeline and LLM Integration The RAG implementation described here follows a fairly standard architecture but with enterprise-grade security requirements layered on top. When a user submits a search query or requests an AI-generated answer, the system must: - Authenticate and authorize the user - Determine which data sources are enabled for that user and organization - Execute searches against both internal Slack content and enabled external sources - Retrieve relevant documents and snippets from these sources - Construct a prompt that includes the user's query and the retrieved context - Send this prompt to the LLM running in the escrow VPC - Receive the generated response - Return it to the user without persisting it in Slack's databases The decision not to store AI-generated search summaries is particularly interesting from an LLMOps perspective. While this maximizes privacy and minimizes data retention risks, it also means they cannot easily implement features like showing users their search history, learning from past queries to improve results, or debugging issues by examining what the LLM actually returned. This represents a clear tradeoff between privacy principles and operational observability. ## Deployment and Infrastructure Considerations While the blog post doesn't provide extensive infrastructure details, several aspects of the deployment can be inferred. The AWS escrow VPC architecture suggests they're running LLM inference on AWS infrastructure, likely using managed services or containerized deployments to host the models. The real-time nature of the system implies they need sufficient compute capacity to handle concurrent inference requests with acceptable latency, which could mean running multiple model replicas behind load balancers. The integration with Slack's existing compliance infrastructure—mentioned but not detailed—suggests they've built or adopted systems for handling encryption key management, data residency requirements for different jurisdictions, audit logging, and other enterprise compliance needs. Extending these to cover LLM operations would involve ensuring that prompts and responses are logged appropriately (while respecting privacy requirements), that data routing respects residency rules, and that encryption is maintained throughout the RAG pipeline. ## Operational Monitoring and Observability The text doesn't explicitly address monitoring and observability, which is a notable gap in understanding their full LLMOps implementation. In production, such a system would require extensive monitoring including: - LLM inference latency and throughput metrics - External API response times and error rates - OAuth token validation success rates - Cache hit rates for the client-side caching mentioned - User experience metrics like time-to-first-result - Cost metrics for LLM API calls and external search API usage - Quality metrics to detect when the LLM is producing poor or irrelevant summaries The lack of discussion around evaluation, testing, and quality assurance is also noteworthy. How do they test that the RAG pipeline correctly respects permissions? How do they evaluate the quality of search results and AI summaries? How do they handle prompt engineering and optimization? These are critical LLMOps concerns that aren't addressed in this particular blog post. ## Security and Privacy as LLMOps Constraints What makes this case study particularly relevant to LLMOps is how security and privacy requirements fundamentally shape the operational architecture. Rather than treating security as a layer added on top of the core functionality, Slack's implementation makes security constraints primary architectural drivers. The choice of RAG over fine-tuning, the escrow VPC pattern, the decision not to store external data or AI-generated summaries, and the OAuth-based permissions model all stem from their security principles. This demonstrates an important LLMOps consideration: production LLM systems often cannot simply optimize for model performance, latency, or cost. Enterprise deployments must balance these factors against security, compliance, privacy, and governance requirements that can significantly constrain architectural choices. Slack's approach shows one way to navigate these constraints, though it necessarily involves tradeoffs. The real-time federated approach likely increases latency compared to searching pre-indexed data. The strict permissions enforcement adds complexity to every query. The decision not to store data limits some potential product features and analytical capabilities. ## Critical Assessment and Limitations While Slack provides a detailed technical narrative, several aspects warrant critical examination. First, the blog post is explicitly promotional material designed to showcase their security-conscious approach, so claims should be evaluated carefully. The assertion that "customer data never leaves Slack's trust boundary" depends heavily on how one defines that boundary—data is being processed by third-party LLMs, even if hosted in Slack's VPC. The real-time federated search approach, while elegant in some ways, likely introduces latency and reliability challenges that aren't discussed. External APIs may be slow, rate-limited, or temporarily unavailable, potentially degrading the user experience. The text doesn't address how they handle these scenarios or what fallback mechanisms exist. The scalability implications of the OAuth-based permission model are also unclear. Every search request potentially requires validating multiple OAuth tokens and making multiple external API calls. As the number of integrated external sources grows, this could become a significant bottleneck or cost driver. Finally, the lack of discussion around LLM output quality, evaluation methodologies, and continuous improvement processes suggests this blog post focuses primarily on the security architecture rather than providing a complete picture of their LLMOps practices. Questions about how they handle hallucinations, ensure factual accuracy, manage prompt templates, or iterate on model performance remain unanswered. ## Conclusion Slack's enterprise search implementation represents a real-world example of deploying LLMs in production within a security-critical enterprise environment. The case study illustrates how RAG architectures, careful permissions management, and thoughtful infrastructure design can address some of the key challenges in bringing LLM capabilities to enterprise users. The federated approach to external data, the escrow VPC pattern for model hosting, and the layered consent model for data access all represent practical solutions to common LLMOps challenges. However, the promotional nature of the content and the gaps in discussing operational concerns like monitoring, evaluation, and error handling mean this should be viewed as a partial view of their LLMOps practices rather than a complete technical deep dive.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.