Prudential developed "Just Ask," an AI-driven advisor assistant platform to address the complex, friction-heavy life insurance sales process that typically spans 8-10 weeks and involves navigating hundreds of products, regulatory requirements, and forms across different states. The company built a multi-agent system on AWS that includes specialized agents for product recommendations, medical underwriting, quoting, forms selection, and book of business management—all orchestrated through a conversational interface. Within 12 weeks of deployment, the platform processed 1,800 messages across 900+ financial planners from 550+ organizations, delivered 100+ successful quotes, and saved approximately 4,500 human hours, with user adoption growing organically at 175% for some agents and demonstrating 90%+ accuracy across most specialized agents.
Prudential’s life insurance distribution business faced a significant operational challenge: the life insurance sales process is exceptionally complex and time-consuming, typically requiring 8-10 weeks from initial client conversation to policy issuance. Unlike commoditized products like auto or home insurance, life insurance is deeply personalized, with pricing based on individual age, lifestyle, health conditions, and state-specific regulatory requirements. The company identified a $12 trillion coverage gap in the US market representing millions of underprotected families, and recognized that reducing friction in the sales process could help close this gap more quickly.
Financial professionals navigating this process face numerous pain points across five key stages: prospecting (researching among hundreds of product variants with different eligibility rules and riders), quoting (running 5-10 scenarios accounting for health class, tobacco status, and state residence), application (selecting correct forms that vary by state and product type), underwriting (waiting weeks in the dark while cases are processed), and ongoing policy management (handling 20-30 year policy lifecycles with annual reviews and transactions). Each stage requires searching through PDFs, spreadsheets, making phone calls, and waiting for responses—with individual queries consuming minutes to hours and medical underwriting decisions taking 24-48 hours.
Rather than building a monolithic AI solution, Prudential’s data science and engineering teams created an “advisor intelligence fabric”—an agentic platform ecosystem that supports multiple specialized agents working in concert. This architectural decision reflects sophisticated thinking about production LLM systems: different problem domains (products, quotes, forms, book of business) have different data sources, logic patterns, and complexity profiles that are better served by purpose-built agents than a one-size-fits-all approach.
The architecture comprises several key layers. At the top sits a data layer where raw datasets including forms, products, and underwriting manuals are ingested through AWS EventBridge and processed into embeddings stored in a vector database. The agentic layer contains an orchestrator that performs non-deterministic intent-based routing to specialized agents. The platform layer includes LLM model registry, embedding and vector space components, and monitoring/observability infrastructure. Advisors interact with the system through various interfaces using single sign-on authentication tied to their IIDC IDs.
The platform currently includes five production agents with more in development. The medical underwriting agent addresses one of the most time-consuming aspects of the sales process. Traditionally, advisors would email underwriters and wait 24-48 hours to learn how a client’s medical conditions would affect pricing. This agent was trained on more than 700 medical underwriting documents and past underwriting decisions, using small language models to implement 20+ reasoning steps that can instantly determine underwriting codes. For example, the pricing difference for a diabetic client depends on whether they take oral medication versus injectables, whether diabetes is controlled, and what other conditions are present—the agent navigates this complexity in real-time rather than days.
The quick quote agent automates scenario generation by understanding user queries, collecting necessary information conversationally, and producing quotes instantly rather than requiring advisors to manually navigate complex pricing logic across multiple systems. The forms agent manages access to more than 600 forms, allowing advisors to request state-specific and product-specific forms through natural language rather than calling support lines (which typically took 10-15 minutes per inquiry).
The product agent provides access to 400+ product content items using a RAG pattern with semantic search and cohere ranking algorithms. It retrieves relevant documents from the vector database, summarizes information, and provides pinpointed answers to advisor queries rather than forcing them to search through documentation manually. The book of business agent analyzes millions of historical policies to create insights and recommended actions, enabling advisors to proactively manage client relationships over 10-20 year policy lifecycles by grounding conversations in historical context and suggesting optimal next actions.
The orchestration agent serves as the intelligent router that understands financial advisor queries and directs them to appropriate specialized agents. This is described as non-deterministic intent-based routing, suggesting the orchestrator uses LLM capabilities to interpret user intent rather than relying on rigid rule-based logic. All chat history is stored in Amazon DynamoDB, with both short-term and long-term context and memory preserved per advisor (tied to their IIDC ID). The team has implemented sophisticated memory management strategies with configurable retention periods (24 hours, 72 hours, or longer depending on conversation flow), allowing agents to maintain continuity across interactions.
Each specialized agent maintains connectivity back to the centralized context and memory stores in DynamoDB while also accessing the RAG pipeline for information retrieval. The system uses Redis for local retrieval and caching, optimizing performance for frequently accessed information. This design enables agents to work both independently and collaboratively while maintaining coherent conversation state.
The platform leverages a GenAI gateway that connects to both OpenAI and AWS Bedrock endpoints, with all models registered in a centralized LLM model registry. This multi-provider approach provides flexibility and redundancy, though the presentation doesn’t detail specific model selection criteria or switching logic. The gateway handles generation tasks for content creation as well as reasoning and logic-based operations, suggesting different models may be used for different agent functions.
The embedding and vector space infrastructure processes documents into vector representations stored in a vector database (specific vector database technology not mentioned). The RAG pipeline retrieves relevant information based on semantic similarity, with agents applying ranking algorithms to identify the most appropriate content for user queries. The use of cohere ranking algorithms specifically for the product agent indicates attention to retrieval quality beyond simple similarity scores.
Given the highly regulated nature of insurance, the platform implements multiple layers of responsible AI controls. AWS Bedrock guardrails are deployed at both the orchestrator level and for each individual agent, using the foundational guardrails Bedrock provides out-of-the-box. Beyond these standard controls, the team has implemented custom guardrails defined at the agent level to address specific functional requirements and regulatory constraints around what information can be provided and how it should be presented.
This multi-layered guardrail approach reflects production LLM system best practices: leveraging platform-provided safety controls while adding domain-specific rules that address the particular compliance and legal requirements of insurance sales. The fact that guardrails are configurable per agent suggests recognition that different functions (quoting versus product information versus underwriting) may have different regulatory constraints.
The platform includes an “agent core” that centralizes monitoring and observability across all agents. This component handles logging, alerting, and prompt management—critical operational capabilities for production LLM systems. The team has implemented a comprehensive evaluation framework with three distinct approaches: human-in-the-loop evaluation, rule-based evaluation, and LLM-as-a-judge evaluation. This multi-method approach reflects mature thinking about LLM evaluation, recognizing that different validation methods have different strengths and that no single approach is sufficient.
Overall, the team reports 90%+ accuracy across most agents, which is a strong but not perfect performance level. Notably, the medical underwriting agent—handling the most complex reasoning tasks with hundreds of potential medical conditions and pricing factors—still relies on human-in-the-loop for particularly complex cases. This represents a pragmatic approach to production deployment: automate what can be reliably automated while maintaining human oversight for edge cases that exceed current model capabilities. One financial professional specifically mentioned they initially called human underwriters to verify the AI’s recommendations and found them consistent, providing real-world validation of the accuracy claims.
The agent core also includes an MCQ (likely Model, Configuration, Query) registry for business APIs and tools. The team is working toward an agent-to-agent (A2A) layer that would allow agents to interact with each other and external systems more flexibly, potentially enabling agents to operate independently outside the current chat-based interface and be embedded directly in other systems across Prudential.
The platform is deployed as “Just Ask,” providing financial professionals with a conversational interface accessible through advisor and wholesaler applications. Users authenticate through single sign-on and interact with the system in plain language rather than navigating complex menus or learning specialized query syntax. This chat-based approach lowers the barrier to adoption and aligns with the natural workflow of advisors who are accustomed to asking questions of colleagues and support staff.
The decision to use a unified conversational interface with intelligent routing underneath (rather than requiring users to select which agent to interact with) is significant for user experience. Advisors don’t need to understand the system architecture or make decisions about which specialized agent to invoke—they simply ask questions and the orchestrator handles routing complexity transparently.
The case study provides detailed metrics from the first 12 weeks of production deployment. The platform processed 1,800 messages and handled 1,200 distinct queries from 900+ unique financial planners across 550+ external organizations. This represents genuine adoption at scale rather than a limited pilot—Prudential’s total addressable market is approximately 150,000 financial professionals, so reaching 900+ in 12 weeks without formal marketing represents significant organic growth.
Adoption patterns show strong retention and viral growth characteristics. Forms agent adoption grew 175% over the period, and the system saw 4x improvement in user retention (returning users reaching double-digit percentages). Organizational reach expanded by over 50%, indicating that advisors who tried the system told colleagues who then also adopted it, creating organic network effects.
The business impact metrics are substantial. The quoting agent alone delivered 100+ successful quotes, translating directly to applications entering the pipeline. This agent saved approximately 4,500 human hours—time advisors previously spent manually running scenarios that are now handled in seconds. The forms agent resolved 250+ forms interactions, saving 2,500+ minutes that would have been spent on phone calls and emails to retrieve PDFs. These time savings compound across the advisor base, potentially representing millions in productivity gains at full scale.
User feedback indicates genuine satisfaction and behavior change. Advisors specifically mentioned appreciating how the system guides them through information collection in an orderly format. The fact that advisors “do not want to go back to the old ways of working” after just 12 weeks suggests the platform is delivering meaningful value rather than being used out of novelty. The 850+ user onboarding happened entirely through organic demand without formal marketing, indicating strong product-market fit.
While the results are impressive, several considerations merit attention. The 90%+ accuracy figure, while strong, means approximately 10% of interactions may produce incorrect or suboptimal results. In the insurance domain, errors in underwriting recommendations or form selection could lead to application delays or rejections, potentially impacting customer experience and advisor credibility. The reliance on human-in-the-loop for complex medical underwriting cases acknowledges this limitation but also means the system hasn’t fully automated the most time-consuming aspects of the workflow.
The presentation comes from the product team and emphasizes positive results without discussing failure modes, edge cases, or situations where the AI provided poor recommendations. Production LLM systems invariably encounter challenges with hallucination, context misunderstanding, or inappropriate responses despite guardrails—the absence of discussion about these issues doesn’t mean they don’t occur. The evaluation framework mentioned (human-in-the-loop, rule-based, LLM-as-judge) is sophisticated, but details about evaluation criteria, benchmark datasets, or how the 90%+ accuracy was measured are not provided.
The architecture described is complex with multiple layers (orchestrator, specialized agents, various databases, gateways, monitoring systems) which creates operational overhead and potential points of failure. The team’s roadmap toward agent-to-agent communication and reusability across Prudential systems suggests current capabilities are still evolving and the architecture may undergo significant changes. The use of multiple LLM providers (OpenAI and Bedrock) provides redundancy but also creates complexity in managing prompts, evaluating consistent performance, and controlling costs across providers.
From a cost perspective, processing 1,800 messages across multiple agents with RAG retrieval, embeddings, and sophisticated reasoning likely involves significant API costs, particularly if using advanced models from OpenAI. The presentation doesn’t discuss cost-effectiveness or ROI calculations beyond time savings, which is notable given that LLM inference costs can be substantial at scale. The 4,500 hours saved translates to clear business value, but whether this exceeds the platform development and operational costs isn’t addressed.
The team is working on additional agents including an illustrations agent that will provide smart illustrations to advisors. The vision for agent-to-agent (A2A) communication suggests movement toward more autonomous agent systems that can collaborate to solve complex tasks without requiring orchestrator mediation for every step. The goal of embedding agents directly into other systems rather than only through the chat interface indicates thinking about AI capabilities as modular services that can be composed into various workflows.
The platform represents a significant investment in AI infrastructure that extends beyond the immediate use case. The agent core with its monitoring, prompt management, and evaluation capabilities creates reusable infrastructure for deploying additional agents addressing other advisor needs or even extending to other parts of Prudential’s business. This is strategic platform thinking rather than point solution development—building capabilities that can scale across multiple use cases rather than solving a single problem.
Overall, this case study demonstrates a sophisticated approach to deploying LLMs in production for a complex, regulated industry. The multi-agent architecture, comprehensive evaluation framework, attention to responsible AI and guardrails, and operational infrastructure for monitoring and management reflect maturity in LLMOps practices. The strong adoption metrics and user satisfaction indicate the platform is delivering genuine value, though the relatively short 12-week timeframe means longer-term reliability, cost-effectiveness, and scalability remain to be proven. The architectural complexity and ongoing evolution suggest this is an ambitious initiative that will require sustained investment and refinement as it scales across Prudential’s 150,000+ financial professional network.
Coinbase, a cryptocurrency exchange serving millions of users across 100+ countries, faced challenges scaling customer support amid volatile market conditions, managing complex compliance investigations, and improving developer productivity. They built a comprehensive Gen AI platform integrating multiple LLMs through standardized interfaces (OpenAI API, Model Context Protocol) on AWS Bedrock to address these challenges. Their solution includes AI-powered chatbots handling 65% of customer contacts automatically (saving ~5 million employee hours annually), compliance investigation tools that synthesize data from multiple sources to accelerate case resolution, and developer productivity tools where 40% of daily code is now AI-generated or influenced. The implementation uses a multi-layered agentic architecture with RAG, guardrails, memory systems, and human-in-the-loop workflows, resulting in significant cost savings, faster resolution times, and improved quality across all three domains.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Notion, a knowledge work platform serving enterprise customers, spent multiple years (2022-2026) iterating through four to five complete rebuilds of their agent infrastructure before shipping Custom Agents to production. The core problem was enabling users to automate complex workflows across their workspaces while maintaining enterprise-grade reliability, security, and cost efficiency. Their solution involved building a sophisticated agent harness with progressive tool disclosure, SQL-like database abstractions, markdown-based interfaces optimized for LLM consumption, and a comprehensive evaluation framework. The result was a production system handling over 100 tools, serving majority-agent traffic for search, and enabling workflows like automated bug triaging, email processing, and meeting notes capture that fundamentally changed how their company and customers operate.