HDI: Building and Optimizing a RAG-based Customer Service Chatbot

LLMOps Database

Insurance

HDI

Company

HDI

Title

Building and Optimizing a RAG-based Customer Service Chatbot

Industry

Insurance

Link

https://www.youtube.com/watch?v=0wphTp29ZK0

Year

2022

Summary (short)

HDI, a German insurance company, implemented a RAG-based chatbot system to help customer service agents quickly find and access information across multiple knowledge bases. The system processes complex insurance documents, including tables and multi-column layouts, using various chunking strategies and vector search optimizations. After 120 experiments to optimize performance, the production system now serves 800+ users across multiple business lines, handling 26 queries per second with 88% recall rate and 6ms query latency.

This case study details the journey of HDI, a German insurance company, in developing and deploying a production-grade RAG (Retrieval Augmented Generation) system to enhance their customer service operations. The project, implemented with AWS Professional Services, showcases a comprehensive approach to building and optimizing a large-scale LLM-based system. ### Business Context and Problem Statement HDI faced a common challenge in insurance customer service: agents needed to quickly access and parse information from multiple sources (SharePoint, databases) to answer customer queries about insurance coverage. Many documents were over 100 pages long, making manual searching time-consuming and inefficient. The goal was to consolidate all knowledge into a single base that agents could query using natural language. ### Technical Architecture and Implementation The system uses a modular architecture built on AWS services, featuring: * A document ingestion pipeline * Vector storage using OpenSearch * Feedback loops for continuous improvement * A web interface behind a load balancer * Integration with external data sources The team made a deliberate choice of OpenSearch as their vector store after comprehensive analysis of various options. OpenSearch proved superior in most criteria, including scalability, manageability, and cost-effectiveness, though query latency required optimization. ### Development and Optimization Process The team conducted approximately 120 experiments before moving to MVP, focusing on several key areas: **Document Processing and Chunking:** * Implemented specialized chunking approaches for complex documents * Handled challenging elements like tables, multi-column layouts, and compound German words * Used markdown formatting to preserve document structure * Maintained context by including headers and table information in chunks * Balanced chunk sizes to optimize between context preservation and token costs **Vector Indexing and Search:** * Implemented HNSW (Hierarchical Navigable Small World) algorithm for efficient vector search * Optimized index parameters including 'M' (number of connections) and 'EF' (search depth) * Created custom hybrid search implementation * Added reranking capabilities for improved accuracy **Evaluation and Metrics:** * Implemented comprehensive evaluation metrics including: * Document recall * Query latency * Context precision * Answer faithfulness * Built feedback loops for continuous improvement * Created ground truth datasets for testing ### Production Performance and Results The system currently achieves: * 26 queries per second processing capacity * 6ms query latency * 88% recall rate * Serves 800+ users across multiple business lines * Successfully scaled from 350 to 800 users * Planning expansion to 1,200 users by year-end ### Key Learnings and Best Practices 1. **Start with Baselines**: The team recommends starting with simple API solutions like Amazon Bedrock knowledge bases to establish baselines before custom optimization. 2. **Document Processing**: Complex document layouts and tables require sophisticated parsing strategies. Modern tools like Amazon Bedrock Data Automation can help automate this process. 3. **Experimentation Framework**: While there are theoretically thousands of possible combinations of parameters, the team found success by: * Prioritizing experiments based on business impact * Setting clear thresholds for moving to MVP (85% recall rate) * Continuing optimization post-deployment for incremental improvements 4. **KPI Definition**: Early establishment of KPIs and ground truth datasets is crucial, as is alignment across departments (security, business, etc.). ### Feedback and Continuous Improvement The system includes a feedback mechanism where agents can provide positive/negative feedback and additional comments. This feedback loop helps identify areas for improvement and optimization. ### Future Outlook The project has become a blueprint for other business lines within HDI, demonstrating the scalability and adaptability of the architecture. The team continues to make incremental improvements, focusing on: * Cluster optimization * Feedback incorporation * Expanding to new business lines * Updating components with newer AWS services and capabilities ### Technical Challenges and Solutions The team faced several technical challenges: * German language complexity, including compound words and abbreviations * Complex document layouts and tables * Balancing recall vs. latency in vector search * Managing chunking strategies for optimal context preservation These were addressed through careful experimentation, custom solutions, and continuous optimization of the system components.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source