GEICO explored using LLMs for customer service chatbots through a hackathon initiative in 2023. After discovering issues with hallucinations and "overpromising" in their initial implementation, they developed a comprehensive RAG (Retrieval Augmented Generation) solution enhanced with their novel "RagRails" approach. This method successfully reduced incorrect responses from 12 out of 20 to zero in test cases by providing structured guidance within retrieved context, demonstrating how to safely deploy LLMs in a regulated insurance environment.
This case study details GEICO's journey in implementing Large Language Models (LLMs) for customer service applications, providing valuable insights into the challenges and solutions for deploying LLMs in a regulated insurance environment.
The initiative began with a 2023 hackathon where teams explored potential LLM applications. One winning proposal focused on creating a conversational interface to replace traditional web forms, allowing dynamic question-asking based on conversation state and data model requirements. However, early testing revealed significant challenges with both commercial and open-source LLMs, particularly regarding response reliability and accuracy.
### Technical Implementation and Challenges
The team identified hallucinations as a major concern, particularly a specific type they termed "overpromising" where the LLM would incorrectly assume capabilities it didn't have. Rather than pursuing expensive fine-tuning approaches, they opted for Retrieval Augmented Generation (RAG) as their primary solution due to its cost-effectiveness, flexibility, transparency, and efficiency.
Their RAG implementation involved several key technical components:
* A pipeline for converting business knowledge into vector representations
* An asynchronous offline conversion process to maintain high QPS (Queries Per Second)
* Implementation of Hierarchical Navigable Small World (HNSW) graphs for efficient vector search
* A two-step retrieval process using LLM-translated user inputs
* Strategic positioning of retrieved knowledge within the context window
The team developed a sophisticated data processing pipeline that included:
* Document splitting
* Embedding generation through API calls
* Metadata extraction using LLMs
* Vector database indexing with metadata support
### Innovation: The RagRails Approach
One of the most significant innovations was the development of "RagRails," a novel approach to guide LLM responses. This came after observing that even with identical inputs, the LLM would produce varying responses with a consistent percentage being incorrect. Initial testing showed that 12 out of 20 responses contained errors in certain scenarios.
The RagRails strategy involves:
* Adding specific guiding instructions within retrieved records
* Maintaining context relevance while reinforcing desired behaviors
* Implementing systematic testing for consistency
* Focusing on repeatability in positive outcomes
This approach proved highly effective, reducing error rates from 60% to 0% in their test cases.
### Production Considerations and Optimizations
The team implemented several optimization strategies for production deployment:
* Relevance checking mechanisms to filter retrieved context
* Ranking mechanisms to prioritize most relevant information
* Strategic positioning of context within the LLM's attention window
* Balancing between response quality and computational cost
They also recognized the need for cost management in production, suggesting the use of smaller, specialized models for specific tasks like:
* Optimization
* Entity extraction
* Relevance detection
* Validation
### Technical Architecture Details
The system architecture incorporates several sophisticated components:
* Vector database implementation using HNSW for efficient high-dimensional search
* Asynchronous processing pipeline for knowledge base updates
* Multiple LLM integrations for different processing steps
* Custom relevance checking mechanisms
* Context injection system for the RagRails implementation
### Quality Assurance and Testing
The team implemented robust testing procedures including:
* Systematic response evaluation
* Repeatability testing for reliability assessment
* Performance metrics tracking
* Cost-benefit analysis of different approaches
### Production Challenges and Solutions
Several production challenges were addressed:
* Managing context window limitations
* Balancing response quality with computational cost
* Handling heterogeneous input representations
* Maintaining consistency in responses
* Managing model updates and knowledge base synchronization
### Results and Impact
The implementation demonstrated significant improvements in:
* Response accuracy
* Consistency in handling user queries
* Reduction in hallucination instances
* Cost-effective scaling of LLM capabilities
### Future Directions
GEICO continues to explore:
* Further optimization of the RagRails approach
* Integration of smaller, specialized models for specific tasks
* Improved relevance checking mechanisms
* Cost optimization strategies
This case study represents a significant contribution to the field of LLMOps, demonstrating how careful engineering and innovative approaches can make LLMs reliable enough for production use in regulated industries. The RagRails approach, in particular, offers a novel solution to the common challenge of LLM hallucinations and could be applicable across various domains beyond insurance.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.