Company
Statista
Title
Optimizing RAG-based Search Results for Production: A Journey from POC to Production
Industry
Research & Academia
Year
2023
Summary (short)
Statista, a global data platform, developed and optimized a RAG-based AI search system to enhance their platform's search capabilities. Working with Urial Labs and Talent Formation, they transformed a basic prototype into a production-ready system that improved search quality by 140%, reduced costs by 65%, and decreased latency by 10%. The resulting Research AI product has seen growing adoption among paying customers and demonstrates superior performance compared to general-purpose LLMs for domain-specific queries.
This case study presents a comprehensive journey of implementing and optimizing a production LLM system at Statista, a global data platform serving over 30,000 paying customers with millions of statistics across various industries. # Context and Business Challenge Statista faced a significant challenge in early 2023 with the emergence of ChatGPT and other LLMs. As a platform hosting millions of statistics and serving 23 million views per month, they needed to enhance their search and discovery capabilities while maintaining their position as a trusted data source. The challenge was particularly important given that 66-80% of their traffic comes from organic search. # Initial Approach and Development The journey began with a methodical approach: * Dedicated one engineer for two months to explore potential use cases * Created an initial prototype to prove the concept * Partnered with external expertise (Urial Labs) for production optimization # Technical Implementation Details The system was implemented as a RAG (Retrieval Augmented Generation) application with several key components: * Vector store for semantic search across millions of statistics * Multi-stage retrieval and ranking system * Answer generation using LLMs * Quality rating system for answers The initial implementation had significant challenges: * 42 LLM calls per request (40 for reranking, 1 for answering, 1 for rating) * High latency (~30 seconds) * High costs (~8 cents per query) * Quality issues (30% on internal metrics) # Optimization Process and Methodology The team implemented a systematic optimization approach: * Established comprehensive traceability to understand performance bottlenecks * Defined clear metrics prioritizing quality, then cost, then latency * Created a reference dataset with expert-validated answers * Implemented automated testing infrastructure for rapid experimentation * Conducted over 100 experiments to optimize performance Key technical innovations included: ## Query Processing Improvements * Implemented query rewriting for better semantic matching * Developed multi-query approach to capture different aspects of complex questions * Utilized Hypothetical Document Embeddings (HyDE) technique to improve retrieval quality ## Model Selection and Optimization * Conducted comprehensive model comparisons across different providers * Evaluated trade-offs between quality, cost, and latency * Implemented dynamic model selection based on query complexity # Results and Production Implementation The optimization efforts yielded impressive results: * 140% improvement in answer quality * 65% reduction in costs * 10% improvement in latency (after reinvesting some gains into quality improvements) The production system includes several sophisticated features: * Parallel retrieval pipelines * Dynamic model selection * Automated quality assessment * Key fact extraction and visualization # Business Impact and Adoption The system, launched as "Research AI", has shown strong business results: * Increasing usage among paying customers * Low bounce rates indicating good user engagement * Higher content interaction rates compared to traditional search * Competitive performance against leading generative AI models # Production Monitoring and Continuous Improvement The team implemented: * Continuous quality benchmarking against leading AI models * Regular quality metric updates and calibration * A/B testing for new features and integrations * Usage monitoring and cost tracking # Innovation and Future Directions The project has spawned additional innovations: * Development of an AI Router product for optimizing model selection * Exploration of new business models including data licensing for LLM training * Integration possibilities with enterprise customers' internal AI systems # Key Learnings * Importance of systematic optimization methodology * Value of comprehensive metrics and testing infrastructure * Need for balanced approach to quality, cost, and latency * Significance of production-ready monitoring and evaluation systems The case study demonstrates how careful engineering, systematic optimization, and focus on production metrics can transform a proof-of-concept AI system into a valuable production service. The team's approach to balancing quality, cost, and performance while maintaining a focus on user value provides valuable insights for similar LLMOps initiatives.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.