Wix developed an AI-powered data discovery system called Anna to address the challenges of finding relevant data across their data mesh architecture. The system combines multiple specialized AI agents with Retrieval-Augmented Generation (RAG) to translate natural language queries into structured data queries. Using semantic search with Vespa for vector storage and an innovative approach of matching business questions to business questions, they achieved 83% accuracy in data discovery, significantly improving data accessibility across the organization.
This case study details Wix's implementation of a sophisticated LLM-powered data discovery system in a complex enterprise environment. The company faced significant challenges with data discovery across their data mesh architecture, where data was distributed across multiple domains representing different products and features. The traditional approach of using a wizard-based tool (Data Playground) with a semantic layer proved insufficient for non-technical users trying to navigate thousands of dimensions and metrics.
The solution they developed, called Anna, represents a comprehensive example of LLMs deployed in production, showcasing several key aspects of modern LLMOps:
### System Architecture and Components
The system employs a multi-agent architecture where different specialized AI agents handle specific aspects of the data discovery process:
* **Root Agent**: Serves as the primary interface, handling intent recognition and ambiguity resolution
* **Question Validation Agent**: Validates queries against supported entities and question types
* **Question Divider Agent**: Breaks complex queries into manageable sub-questions
* **Question Generator Agent**: Creates training questions for the embedding system
* **Data Playground Agent**: Translates validated queries into structured API payloads
* **Data Playground Retry Agent**: Handles error recovery and query refinement
### RAG Implementation Details
The RAG system uses Vespa as the vector database for semantic search, with several notable implementation details:
* Embeddings are updated hourly through an Airflow DAG to keep the semantic layer current
* Initial attempts at embedding table-level metadata proved ineffective
* A breakthrough came from embedding business questions rather than raw metadata
* The system achieved 83% success rate in retrieving relevant dimensions within top-k results
### Production Challenges and Solutions
The case study honestly discusses several production challenges:
* **Incorrect Dimension Selection**: The system sometimes retrieves semantically similar but irrelevant dimensions. This highlights the importance of validation and the challenge of preventing false positives in production LLM systems.
* **Lack of Memory**: The stateless nature of the system leads to potential inconsistencies across interactions, a common challenge in deployed LLM systems.
* **Response Management**: Controlling premature or partial answers from the AI models requires careful prompt engineering and system design.
### Technical Infrastructure
The solution integrates several technical components:
* Vespa for vector storage and similarity search
* Airflow for orchestrating regular embedding updates
* Cube as the semantic engine for SQL generation
* Trino-based query engine for executing the generated queries
* Custom API layer for interfacing with the Data Playground system
### Innovation in RAG Implementation
A particularly noteworthy aspect is their innovative approach to RAG. Instead of traditional document embedding, they:
* Generate synthetic business questions for each dimension and metric
* Use these questions as the knowledge base for semantic search
* Compare user questions against these pre-generated questions
* This approach significantly improved search accuracy compared to traditional metadata embedding
### Monitoring and Quality Control
The system includes several quality control mechanisms:
* Validation agents to ensure query correctness
* Error handling and retry mechanisms for failed queries
* Success metrics based on top-k retrieval accuracy
* User feedback integration for continuous improvement
### Production Deployment Considerations
The case study reveals several important production deployment considerations:
* Regular updates to the knowledge base through scheduled DAGs
* Error handling and graceful degradation
* Integration with existing data infrastructure
* Balance between automation and human oversight
### Future Improvements
The team acknowledges several areas for future enhancement:
* Adding system memory to maintain consistency across interactions
* Expanding support for additional question types
* Improving embedding refinement
* Enhanced personalization based on user behavior
This case study provides valuable insights into implementing LLMs in production for enterprise data discovery. It demonstrates the importance of careful system design, the benefits of a multi-agent approach, and the challenges of maintaining accuracy and reliability in production AI systems. The honest discussion of challenges and limitations adds credibility to the implementation details and provides valuable lessons for similar projects.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.