DoorDash: Generative AI Contact Center Solution with Amazon Bedrock and Claude

LLMOps Database

E-commerce

DoorDash

Company

DoorDash

Title

Generative AI Contact Center Solution with Amazon Bedrock and Claude

Industry

E-commerce

Link

https://aws.amazon.com/solutions/case-studies/doordash-bedrock-case-study?tag=soumet-20

Year

2024

Summary (short)

DoorDash implemented a generative AI-powered self-service contact center solution using Amazon Bedrock, Amazon Connect, and Anthropic's Claude to handle hundreds of thousands of daily support calls. The solution leverages RAG with Knowledge Bases for Amazon Bedrock to provide accurate responses to Dasher inquiries, achieving response latency of 2.5 seconds or less. The implementation reduced development time by 50% and increased testing capacity 50x through automated evaluation frameworks.

Tags

## Overview DoorDash, the local commerce platform with over 37 million monthly active consumers and 2 million monthly active Dashers (delivery drivers), needed to improve its contact center operations to better serve its massive user base. The company receives hundreds of thousands of support calls daily from Consumers, Merchants, and Dashers, with Dashers particularly relying on phone support while on the road. Despite having an existing IVR solution that had already reduced agent transfers by 49% and improved first contact resolution by 12%, most calls were still being redirected to live agents, creating an opportunity for further automation through generative AI. The case study presents a compelling example of deploying LLMs in a high-volume, latency-sensitive production environment. Voice-based AI applications present unique challenges compared to text-based chatbots, particularly around response time requirements—drivers on the road cannot wait for lengthy processing delays. This made the project technically demanding and serves as a valuable reference for similar contact center implementations. ## Technical Architecture and Model Selection The solution was built on Amazon Bedrock, AWS's fully managed service for accessing foundation models. DoorDash specifically chose Anthropic's Claude models, ultimately settling on Claude 3 Haiku for production deployment. The choice of Haiku is notable—it's the fastest and most cost-efficient model in the Claude 3 family, which was critical given the voice application's strict latency requirements. The team achieved response latency of 2.5 seconds or less, which is essential for maintaining natural conversation flow in phone support scenarios. The architecture integrates with Amazon Connect (AWS's AI-powered contact center service) and Amazon Lex for natural language understanding. This represents a multi-layer AI approach where Lex handles initial speech recognition and intent classification, while the LLM-based solution provides more sophisticated conversational capabilities and knowledge retrieval. It's worth noting that the case study is published by AWS and naturally emphasizes AWS services. While the technical claims appear reasonable, independent verification of the specific performance metrics is not available. The 50% reduction in development time attributed to Amazon Bedrock is a relative claim that depends heavily on what the comparison baseline was. ## RAG Implementation A critical component of the solution is the retrieval-augmented generation (RAG) architecture. DoorDash integrated content from its publicly available help center as the knowledge base, allowing the LLM to provide accurate, grounded responses to Dasher inquiries. The implementation uses Knowledge Bases for Amazon Bedrock, which handles the full RAG workflow including: - Content ingestion from DoorDash's help documentation - Vector embedding and indexing - Retrieval of relevant content at query time - Prompt augmentation with retrieved context Using a managed RAG service like Knowledge Bases for Amazon Bedrock simplifies the operational burden significantly. Rather than building custom integrations for data source connections, chunking strategies, embedding pipelines, and retrieval mechanisms, DoorDash could leverage AWS's managed infrastructure. This aligns with the reported 50% reduction in development time, though teams should carefully evaluate whether managed solutions provide sufficient flexibility for their specific requirements. The choice to limit the knowledge base to publicly available content is significant from a data privacy perspective. The case study explicitly notes that DoorDash does not provide any personally identifiable information to be accessed via the generative AI solutions, which is an important consideration for production LLM deployments handling customer interactions. ## Testing and Evaluation Framework One of the most operationally significant aspects of this case study is the testing infrastructure DoorDash built. Previously, the team had to pull contact center agents off help queues to manually complete test cases—a resource-intensive approach that limited testing capacity. Using Amazon SageMaker, DoorDash built an automated test and evaluation framework that: - Increased testing capacity by 50x (from manual testing to thousands of automated tests per hour) - Enabled semantic evaluation of responses against ground-truth data - Supported A/B testing for measuring key success metrics at scale This testing infrastructure is crucial for LLMOps at scale. Unlike deterministic software systems, LLM outputs are probabilistic and can vary significantly based on prompt variations, model updates, or changes to the knowledge base. Having robust automated evaluation allows teams to: - Validate changes before production deployment - Detect regressions in response quality - Compare performance across different model versions or configurations - Measure semantic correctness rather than just syntactic matching The mention of semantic evaluation against ground-truth data suggests DoorDash implemented some form of embedding-based similarity scoring or LLM-as-judge evaluation, though the specific methodology is not detailed in the case study. ## Safety and Reliability Considerations The case study highlights several safety features that were important for production deployment. Claude was noted for its capabilities in: - Hallucination mitigation - Prompt injection detection - Abusive language detection These are critical considerations for customer-facing applications. Voice support systems interact with users who may be frustrated or stressed, and the system must handle adversarial inputs gracefully. The mention of prompt injection as a specific concern indicates DoorDash took security seriously in their implementation. Data security is addressed through Amazon Bedrock's encryption capabilities and the assurance that customer data is isolated to DoorDash's application. For companies in regulated industries or those handling sensitive customer data, understanding how data flows through LLM systems and what guardrails exist is essential. ## Development Timeline and Collaboration The project was completed in approximately 8 weeks through collaboration between DoorDash's team and AWS's Generative AI Innovation Center (GenAIIC). This relatively short timeline for getting to production A/B testing suggests that: - Using managed services and established patterns significantly accelerates development - Having access to specialized AI expertise (through GenAIIC) helped navigate common pitfalls - The existing Amazon Connect infrastructure provided a foundation to build upon However, it's important to note that this 8-week timeline covered development through A/B testing readiness, not full production rollout. The case study mentions the solution was tested in early 2024 before completing rollout to all Dashers. ## Production Results and Scale The solution now handles hundreds of thousands of Dasher support calls daily, representing significant production scale. Key outcomes reported include: - Large and material reductions in call volumes for Dasher-related support inquiries - Reduced escalations to live agents by thousands per day - Reduced number of live agent tasks required to resolve support inquiries - Freed up live agents to handle higher-complexity issues While specific percentage improvements in resolution rates or customer satisfaction scores are not provided, the scale of deployment and the stated reduction in escalations suggest meaningful impact. The decision to expand the solution's capabilities—adding more knowledge bases and integrating with DoorDash's event-driven logistics workflow service—indicates confidence in the production system's stability and effectiveness. ## Future Directions DoorDash is working on extending the solution beyond question-and-answer assistance to take actions on behalf of users. This evolution from a retrieval-based assistant to an agentic system represents a common progression in LLM applications. Integrating with their event-driven logistics workflow service suggests the AI will be able to perform operations like rescheduling deliveries, updating payment information, or resolving order issues—moving from purely informational support to transactional capabilities. ## Balanced Assessment This case study provides a solid example of deploying LLMs in a high-volume, latency-sensitive production environment. The technical approach—using managed services, implementing RAG for grounding, building robust testing infrastructure, and carefully considering safety—represents sound LLMOps practices. However, readers should note that this is an AWS-published case study with inherent promotional elements. The specific metrics (50x testing improvement, 50% development time reduction, 2.5-second latency) should be understood as claims within a marketing context rather than independently verified benchmarks. The actual deployment complexity, ongoing maintenance requirements, and total cost of ownership are not detailed. For teams considering similar implementations, this case study validates the viability of voice-based LLM applications at scale but should be supplemented with technical deep-dives on the specific components (RAG tuning, voice application design, evaluation methodology) to fully understand the implementation challenges.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source