InsuranceDekho: Transforming Insurance Agent Support with RAG-Powered Chat Assistant

LLMOps Database

Insurance

InsuranceDekho

Company

InsuranceDekho

Title

Transforming Insurance Agent Support with RAG-Powered Chat Assistant

Industry

Insurance

Link

https://aws.amazon.com/blogs/machine-learning/how-insurancedekho-transformed-insurance-agent-interactions-using-amazon-bedrock-and-generative-ai?tag=soumet-20

Year

2024

Summary (short)

InsuranceDekho addressed the challenge of slow response times in insurance agent queries by implementing a RAG-based chat assistant using Amazon Bedrock and Anthropic's Claude Haiku. The solution eliminated the need for constant SME consultation, cached frequent responses using Redis, and leveraged OpenSearch for vector storage, resulting in an 80% reduction in response times for customer queries about insurance plans.

## Overview InsuranceDekho is a leading InsurTech platform in India that offers insurance products from over 49 insurance companies through a network of 150,000 point of sale person (POSP) agents and direct-to-customer channels. The company faced a significant operational challenge: insurance advisors, particularly newer ones, frequently needed to consult subject matter experts (SMEs) for policy-specific questions, creating bottlenecks and delays in customer service. This case study describes how they implemented a generative AI-powered chat assistant using Retrieval Augmented Generation (RAG) to enable agents to autonomously resolve customer queries about policy coverages, exclusions, and other insurance details. ## The Problem The core challenge centered on response time and efficiency in the customer service workflow. When customers asked questions about insurance products, advisors often lacked the detailed knowledge to respond immediately and needed to reach out to SMEs. This created several operational issues: - SMEs could only handle one call at a time, creating queuing delays - Response delays of several minutes occurred while waiting for expert input - The process introduced additional costs through SME involvement - Potential customers might abandon the purchase process if they didn't receive timely clarity - Advisors couldn't be expected to memorize the details of 125+ health plans from 21 different insurers The inefficiency was particularly problematic in a competitive market where customers might explore competing services if they received better clarity elsewhere. ## Technical Architecture and Implementation InsuranceDekho selected Amazon Bedrock as their primary foundation model provider after evaluating several generative AI model providers. The decision was influenced by several factors that are worth examining from an LLMOps perspective: **Managed Service Benefits**: Amazon Bedrock's fully serverless offering eliminated the need for provisioning infrastructure, procuring GPUs, or configuring ML frameworks. This significantly reduced development and deployment overheads, which is a key consideration for production LLM deployments where operational complexity can quickly escalate. **Model Flexibility**: The ability to access multiple foundation models through a single API provided operational flexibility. InsuranceDekho specifically mentions that they seamlessly transitioned from Anthropic's Claude Instant to Anthropic's Claude Haiku when Claude 3 became available, without requiring code changes. This kind of model portability is a significant advantage in production environments where model upgrades and optimizations are ongoing concerns. **Performance Options**: Amazon Bedrock's on-demand and provisioned throughput options allowed them to balance latency requirements with cost considerations. **Security**: AWS PrivateLink provided secure, private model access without traversing the public internet, addressing data privacy and compliance requirements that are particularly important in the insurance industry. ### RAG Implementation The solution follows a classic RAG pattern with some notable architectural decisions. The workflow is divided into two main components: an ingestion workflow and a response generation workflow. **Ingestion Workflow**: Insurance policy documents are processed through an embedding model (they mention using a third-party embedding model rather than one from Amazon Bedrock) to generate vector representations. These embeddings are stored in Amazon OpenSearch Service, which was chosen for its scalability, high-performance search capabilities, and cost-effectiveness. This workflow keeps the knowledge base current with the latest insurance policy information. **Response Generation Workflow**: This is where the production LLM deployment becomes more interesting from an LLMOps perspective: - A chatbot serves as the entry point for insurance advisors - A caching layer using Redis on Amazon ElastiCache performs semantic search to check if queries have been recently processed - If a cache miss occurs, an intent classifier powered by Claude Haiku analyzes the query to understand user intent - For generic queries, the intent classifier can provide responses directly, bypassing the full RAG workflow - For complex queries, semantic search is performed on the vector database to retrieve relevant context - The retrieved context is integrated with the query and prompt for augmented generation - Claude Haiku generates the final response based on the augmented information ### LLMOps Considerations Several LLMOps-relevant design decisions are evident in this implementation: **Caching Strategy**: The use of semantic caching with Redis is a practical approach to reducing both latency and cost in production LLM deployments. By caching frequently accessed responses and using semantic similarity to match queries, they can bypass the more expensive generation workflow for redundant queries. **Intent Classification as a Gateway**: Using Claude Haiku as an intent classifier before full RAG processing is an interesting optimization. This allows simple queries to be handled efficiently without the overhead of retrieval, while routing complex queries through the full pipeline. This tiered approach helps manage costs and latency at scale. **Dynamic Prompting**: The mention of dynamic prompting based on query type suggests they've implemented some form of prompt routing or prompt optimization based on intent classification results. **Model Selection**: They specifically chose Anthropic's Claude Haiku after benchmarking, citing its exceptional performance in terms of speed and affordability. For a high-volume customer service application with 150,000 agents, the cost-per-query is a significant operational consideration. ## Results and Impact The implementation reportedly achieved an 80% decrease in response time for customer queries about plan features, inclusions, and exclusions. Insurance advisors can now autonomously address customer queries without constant SME involvement. The company claims improved sales, cross-selling, and overall customer service experience. It's worth noting that this case study is published on the AWS blog and co-authored by AWS personnel, so the presentation is naturally favorable to AWS services. While the 80% reduction in response time is a significant improvement, the case study doesn't provide detailed metrics on accuracy, hallucination rates, or how they handle edge cases where the RAG system might provide incorrect information. In a regulated industry like insurance, accuracy and the handling of incorrect responses would be critical operational concerns that aren't fully addressed in this case study. ## Architecture Components Summary The production system relies on several key components: - **LLM**: Anthropic's Claude Haiku via Amazon Bedrock - **Vector Database**: Amazon OpenSearch Service - **Caching**: Redis on Amazon ElastiCache - **Embedding Model**: Third-party (unspecified) - **Infrastructure**: Serverless via Amazon Bedrock, with AWS PrivateLink for secure access The case study demonstrates a practical approach to deploying RAG-based LLM applications in a production customer service context, with attention to caching, intent classification, and model flexibility. The serverless approach via Amazon Bedrock simplified operational overhead, though the reliance on managed services naturally creates vendor lock-in with AWS.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source