ZenML

Transforming Insurance Agent Support with RAG-Powered Chat Assistant

InsuranceDekho 2024
View original source

InsuranceDekho addressed the challenge of slow response times in insurance agent queries by implementing a RAG-based chat assistant using Amazon Bedrock and Anthropic's Claude Haiku. The solution eliminated the need for constant SME consultation, cached frequent responses using Redis, and leveraged OpenSearch for vector storage, resulting in an 80% reduction in response times for customer queries about insurance plans.

Industry

Insurance

Technologies

Overview

InsuranceDekho is a leading InsurTech platform in India that offers insurance products from over 49 insurance companies through a network of 150,000 point of sale person (POSP) agents and direct-to-customer channels. The company faced a significant operational challenge: insurance advisors, particularly newer ones, frequently needed to consult subject matter experts (SMEs) for policy-specific questions, creating bottlenecks and delays in customer service. This case study describes how they implemented a generative AI-powered chat assistant using Retrieval Augmented Generation (RAG) to enable agents to autonomously resolve customer queries about policy coverages, exclusions, and other insurance details.

The Problem

The core challenge centered on response time and efficiency in the customer service workflow. When customers asked questions about insurance products, advisors often lacked the detailed knowledge to respond immediately and needed to reach out to SMEs. This created several operational issues:

The inefficiency was particularly problematic in a competitive market where customers might explore competing services if they received better clarity elsewhere.

Technical Architecture and Implementation

InsuranceDekho selected Amazon Bedrock as their primary foundation model provider after evaluating several generative AI model providers. The decision was influenced by several factors that are worth examining from an LLMOps perspective:

Managed Service Benefits: Amazon Bedrock’s fully serverless offering eliminated the need for provisioning infrastructure, procuring GPUs, or configuring ML frameworks. This significantly reduced development and deployment overheads, which is a key consideration for production LLM deployments where operational complexity can quickly escalate.

Model Flexibility: The ability to access multiple foundation models through a single API provided operational flexibility. InsuranceDekho specifically mentions that they seamlessly transitioned from Anthropic’s Claude Instant to Anthropic’s Claude Haiku when Claude 3 became available, without requiring code changes. This kind of model portability is a significant advantage in production environments where model upgrades and optimizations are ongoing concerns.

Performance Options: Amazon Bedrock’s on-demand and provisioned throughput options allowed them to balance latency requirements with cost considerations.

Security: AWS PrivateLink provided secure, private model access without traversing the public internet, addressing data privacy and compliance requirements that are particularly important in the insurance industry.

RAG Implementation

The solution follows a classic RAG pattern with some notable architectural decisions. The workflow is divided into two main components: an ingestion workflow and a response generation workflow.

Ingestion Workflow: Insurance policy documents are processed through an embedding model (they mention using a third-party embedding model rather than one from Amazon Bedrock) to generate vector representations. These embeddings are stored in Amazon OpenSearch Service, which was chosen for its scalability, high-performance search capabilities, and cost-effectiveness. This workflow keeps the knowledge base current with the latest insurance policy information.

Response Generation Workflow: This is where the production LLM deployment becomes more interesting from an LLMOps perspective:

LLMOps Considerations

Several LLMOps-relevant design decisions are evident in this implementation:

Caching Strategy: The use of semantic caching with Redis is a practical approach to reducing both latency and cost in production LLM deployments. By caching frequently accessed responses and using semantic similarity to match queries, they can bypass the more expensive generation workflow for redundant queries.

Intent Classification as a Gateway: Using Claude Haiku as an intent classifier before full RAG processing is an interesting optimization. This allows simple queries to be handled efficiently without the overhead of retrieval, while routing complex queries through the full pipeline. This tiered approach helps manage costs and latency at scale.

Dynamic Prompting: The mention of dynamic prompting based on query type suggests they’ve implemented some form of prompt routing or prompt optimization based on intent classification results.

Model Selection: They specifically chose Anthropic’s Claude Haiku after benchmarking, citing its exceptional performance in terms of speed and affordability. For a high-volume customer service application with 150,000 agents, the cost-per-query is a significant operational consideration.

Results and Impact

The implementation reportedly achieved an 80% decrease in response time for customer queries about plan features, inclusions, and exclusions. Insurance advisors can now autonomously address customer queries without constant SME involvement. The company claims improved sales, cross-selling, and overall customer service experience.

It’s worth noting that this case study is published on the AWS blog and co-authored by AWS personnel, so the presentation is naturally favorable to AWS services. While the 80% reduction in response time is a significant improvement, the case study doesn’t provide detailed metrics on accuracy, hallucination rates, or how they handle edge cases where the RAG system might provide incorrect information. In a regulated industry like insurance, accuracy and the handling of incorrect responses would be critical operational concerns that aren’t fully addressed in this case study.

Architecture Components Summary

The production system relies on several key components:

The case study demonstrates a practical approach to deploying RAG-based LLM applications in a production customer service context, with attention to caching, intent classification, and model flexibility. The serverless approach via Amazon Bedrock simplified operational overhead, though the reliance on managed services naturally creates vendor lock-in with AWS.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

AI-Powered Conversational Assistant for Streamlined Home Buying Experience

Rocket 2025

Rocket Companies, a Detroit-based FinTech company, developed Rocket AI Agent to address the overwhelming complexity of the home buying process by providing 24/7 personalized guidance and support. Built on Amazon Bedrock Agents, the AI assistant combines domain knowledge, personalized guidance, and actionable capabilities to transform client engagement across Rocket's digital properties. The implementation resulted in a threefold increase in conversion rates from web traffic to closed loans, 85% reduction in transfers to customer care, and 68% customer satisfaction scores, while enabling seamless transitions between AI assistance and human support when needed.

customer_support chatbot question_answering +40

Building a Microservices-Based Multi-Agent Platform for Financial Advisors

Prudential 2025

Prudential Financial, in partnership with AWS GenAI Innovation Center, built a scalable multi-agent platform to support 100,000+ financial advisors across insurance and financial services. The system addresses fragmented workflows where advisors previously had to navigate dozens of disconnected IT systems for client engagement, underwriting, product information, and servicing. The solution features an orchestration agent that routes requests to specialized sub-agents (quick quote, forms, product, illustration, book of business) while maintaining context and enforcing governance. The platform-based microservices architecture reduced time-to-value from 6-8 weeks to 3-4 weeks for new agent deployments, enabled cross-business reusability, and provided standardized frameworks for authentication, LLM gateway access, knowledge management, and observability while handling the complexity of scaling multi-agent systems in a regulated financial services environment.

healthcare fraud_detection customer_support +48