Dropbox: Building a Universal Search Product with RAG and AI Agents

LLMOps Database

Tech

Dropbox

Company

Dropbox

Title

Building a Universal Search Product with RAG and AI Agents

Industry

Tech

Link

https://dropbox.tech/machine-learning/building-dash-rag-multi-step-ai-agents-business-users

Year

2025

Summary (short)

Dropbox developed Dash, a universal search and knowledge management product that addresses the challenges of fragmented business data across multiple applications and formats. The solution combines retrieval-augmented generation (RAG) and AI agents to provide powerful search capabilities, content summarization, and question-answering features. They implemented a custom Python interpreter for AI agents and developed a sophisticated RAG system that balances latency, quality, and data freshness requirements for enterprise use.

This case study details Dropbox's development of Dash, a sophisticated AI-powered universal search and knowledge management product designed for enterprise use. The study provides deep insights into the challenges and solutions of implementing LLMs in production for business applications, particularly focusing on the combination of retrieval-augmented generation (RAG) and AI agents. The core problem Dropbox aimed to solve was the fragmentation of business information across multiple applications, formats, and data modalities. This fragmentation creates significant challenges for knowledge workers in terms of productivity and security. Their solution, Dash, combines AI-powered features with content access control to help users find, organize, and secure content across their various applications. The technical implementation details are particularly noteworthy for several aspects of LLMOps: **RAG System Implementation** Dropbox's RAG implementation showcases careful consideration of production requirements. They developed a hybrid approach combining traditional information retrieval with modern embedding-based techniques: * Their retrieval system uses a lexical-based approach combined with on-the-fly chunking and reranking * They implemented embedding-based reranking while maintaining strict latency requirements (95% of queries complete within 1-2 seconds) * The system balances three key trade-offs: latency vs. quality, data freshness vs. scalability, and budget vs. user experience **Model Selection and Evaluation** The team conducted rigorous evaluation of different models and approaches: * They tested multiple retrieval methods and model variations on public datasets including Google's Natural Questions, MuSiQue, and Microsoft's Machine Reading Comprehension * They developed custom evaluation metrics including LLM-based judges for answer correctness and completeness * The system remains model-agnostic, allowing flexibility in choosing models and providers based on customer preferences **AI Agents Architecture** A particularly innovative aspect is their implementation of AI agents for handling complex, multi-step tasks. Their approach includes: * A two-stage process of planning and execution * A custom domain-specific language (DSL) similar to Python for expressing logic * A purpose-built minimal Python interpreter focused on security and reliability * Built-in static analysis and type checking for validation **Testing and Quality Assurance** The case study details several sophisticated approaches to testing and quality assurance: * The system makes LLMs "show their work" through code generation, making testing more deterministic * They implemented comprehensive static analysis and runtime type enforcement * The testing framework can pinpoint exactly where logic failures occur in multi-step processes **Security Considerations** Security was clearly a primary concern in the implementation: * The custom interpreter was built from scratch with minimal required functionality * Strong typing and built-in security controls were implemented * The system includes granular access controls to protect sensitive information **Production Challenges and Solutions** The team encountered and solved several critical production challenges: * Data diversity: handling multiple data types and structures * Data fragmentation: efficiently retrieving and synthesizing information from multiple sources * Data modalities: processing different forms of data (text, images, audio, video) * Latency requirements: maintaining responsive performance while delivering high-quality results **Lessons Learned and Future Directions** The case study concludes with valuable insights for LLMOps practitioners: * The importance of choosing the right tool (RAG vs. AI agents) for different types of tasks * The need for careful prompt engineering and optimization for different LLMs * Real-world trade-offs between model size, latency, and accuracy * The value of maintaining model-agnostic systems for flexibility The implementation demonstrates a sophisticated understanding of LLMOps requirements in enterprise environments. Particularly noteworthy is their approach to making AI systems both powerful and trustworthy through careful engineering practices, extensive testing, and strong security controls. The system's architecture shows how multiple AI technologies can be combined effectively while maintaining strict production requirements around performance, security, and reliability.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free