DoorDash: AI-Powered Content Moderation at Scale: SafeChat Platform

Company

DoorDash

Title

AI-Powered Content Moderation at Scale: SafeChat Platform

Industry

E-commerce

Link

https://careersatdoordash.com/blog/doordash-safechat-ai-safety-feature/

Year

2025

Summary (short)

DoorDash developed SafeChat, an AI-powered content moderation system to handle millions of daily messages, hundreds of thousands of images, and voice calls exchanged between delivery drivers (Dashers) and customers. The platform employs a multi-layered architecture that evolved from using three external LLMs to a more efficient two-layer approach combining an internally trained model with a precise external LLM, processing text, images, and voice communications in real-time. Since launch, SafeChat has achieved a 50% reduction in low to medium-severity safety incidents while maintaining low latency (under 300ms for most messages) and cost-effectiveness by intelligently routing only 0.2% of content to expensive, high-precision models.

Tags

content_moderation

customer_support

realtime_application

high_stakes_application

## Overview DoorDash's SafeChat represents a comprehensive real-world deployment of LLMs and AI models for content moderation at massive scale. The platform processes millions of chat messages, hundreds of thousands of images, and numerous voice calls daily to maintain safety and trust on the delivery platform. This case study demonstrates how DoorDash evolved their architecture over time to balance accuracy, cost, and latency while dealing with the practical challenges of deploying LLMs in production. The SafeChat system was built to address a critical business need: screening communications between delivery drivers (Dashers) and customers for inappropriate content such as profanity, threats, and sexual content. The system needed to operate in real-time for synchronous communications like chat and images, while also handling asynchronous channels like voice calls. The solution DoorDash developed showcases pragmatic LLMOps practices including model cascading, continuous evaluation, cost optimization, and phased rollouts. ## Multi-Layer Text Moderation Architecture Evolution DoorDash's approach to text moderation evolved through two distinct phases, demonstrating how organizations can iterate on their LLM architectures based on production data and operational learnings. ### Phase 1: Three-Layer External Model Cascade The initial implementation employed a sophisticated three-layer architecture that intelligently routed messages based on risk assessment. The first layer utilized a free moderation API that served as a high-recall, low-precision filter with extremely low latency. By setting the threshold very low, this layer could automatically clear approximately 90% of all chat messages without incurring any costs. This design choice reflects a fundamental LLMOps principle: use the cheapest appropriate model for each task and reserve expensive resources for cases that truly need them. Messages that weren't cleared by the first layer moved to a fast LLM model in the second layer. This model provided higher precision than the initial filter but also introduced both latency and cost. By the end of this second layer, around 99.8% of all messages were identified as safe, leaving only 0.2% requiring the most intensive analysis. This demonstrates impressive funnel efficiency, with each layer dramatically reducing the volume that needs more expensive processing. The third and final layer employed the most precise LLM model, which was also the slowest and most expensive. At this stage, DoorDash had high confidence that the message being evaluated had genuine potential to be unsafe. The model scored each message from 0 to 5 across three distinct categories: profanity, threats, and sexual content. This granular scoring approach enabled more nuanced decision-making rather than simple binary classification. For example, if a consumer sent a message scoring 4 or higher in at least two categories, the system would give the Dasher the option to cancel the order without penalty. ### Phase 2: Two-Layer Architecture with Internal Model After running the three-layer approach for several months, DoorDash accumulated approximately 10 million labeled data points from the production system. This represents a significant LLMOps success story: using an initially expensive external system to generate training data that could then be used to train a more efficient internal model. This is a common pattern in production LLM systems where external APIs are used to bootstrap operations, but organizations eventually develop proprietary models optimized for their specific use cases. The newly trained internal model replaced the first two layers, becoming the primary filter for incoming messages. For each message, it produces two complementary values: a safeScore and an unsafeScore that always sum to 1. High safeScore messages are automatically cleared, while high unsafeScore messages are flagged as suspicious and routed to the second layer. This binary output with confidence scores provides clear decision boundaries while maintaining interpretability. The second layer remained the precise external LLM, reserved for ambiguous or high-risk cases. With this two-layer architecture, 99.8% of messages are handled by the internal model alone, making the system much more sustainable and cost-effective at scale. The internal model responds in under 300ms for most messages, with only flagged messages experiencing up to three seconds of latency. This latency distribution is acceptable because the delay only affects potentially unsafe content that requires extra scrutiny. This architectural evolution demonstrates several key LLMOps principles: the importance of data collection in production, the value of investing in custom models for high-volume use cases, and the need to continuously optimize for the cost-latency-accuracy tradeoff. ## Image Moderation Architecture DoorDash's approach to image moderation followed a different path, highlighting how different content types require tailored solutions. Most images exchanged on the platform occur during delivery when Dashers provide drop-off proof or communicate with customers. The team began with a shadow testing phase, sending every image through an external vision API for two weeks while collecting results. This cautious approach allowed them to evaluate the API's performance without impacting users. Manual review of both flagged and unflagged samples revealed a significant false positive rate that made the initial API unsuitable for production deployment. This validation phase exemplifies responsible LLMOps practice: thorough offline evaluation before exposing users to automated decisions. DoorDash then explored computer vision models with greater granularity. Unlike the initial API that only returned broad unsafe/safe categories, they selected a model that produced a richer set of subcategories with individual confidence scores. This granularity was critical because safety issues in images are highly diverse, ranging from explicit content to violent imagery to other sensitive material. Treating all unsafe content as a monolithic category wasn't sufficient for nuanced decision-making. The team conducted multiple rounds of threshold tuning against a curated dataset that included both inappropriate content and previously misclassified images. This iterative refinement process significantly reduced false positives while maintaining high recall for genuinely harmful content. The selected model not only provided more accurate results but also offered faster responses at lower cost compared to alternatives, adding approximately 500ms of latency per moderation check. The image moderation pipeline demonstrates the importance of model selection criteria beyond just accuracy. DoorDash evaluated models on accuracy, granularity of outputs, latency, and cost, ultimately choosing a solution that optimized across all these dimensions for their specific use case. ## Voice Moderation Voice moderation represents the most recent addition to SafeChat and presents unique challenges compared to text or images. Unlike synchronous text chat where messages can be blocked before delivery, harmful language in voice calls cannot always be prevented in real-time. However, the system can stop harm from spreading further and mitigate additional damage. The voice moderation system processes calls in real-time, analyzing voice tone, spoken words, and conversational context across multiple languages. Each call is streamed into the moderation pipeline where it receives labels under various categories with associated confidence scores. This streaming architecture enables near real-time detection while calls are in progress. DoorDash employed a phased rollout strategy for voice moderation that exemplifies cautious LLMOps practices. Initially, they enabled the moderation engine for all calls but did not allow any automatic actions to be taken. Instead, flagged conversations were manually reviewed, allowing the team to refine categories and thresholds until results were reliable. Only after achieving acceptable accuracy standards did they begin taking automated actions such as terminating unsafe calls or restricting further contact between parties. This human-in-the-loop approach during the initial deployment phase represents a critical LLMOps pattern for high-stakes applications. By starting with passive monitoring and manual review before enabling automated actions, DoorDash could validate system behavior, tune thresholds, and build confidence in the model's performance without risking user safety or experience. ## Moderation Actions and Decision Logic When unsafe content is detected across any modality, SafeChat employs a graduated response system based on severity and risk level. The four types of actions reflect thoughtful consideration of the different content types and communication channels: For synchronous communications like chat and images, the system can block or censor harmful content before it reaches the recipient, preventing offensive or threatening material from being delivered. This preemptive blocking is only possible because the moderation happens inline before delivery, which is why low latency (under 300ms for most cases) was critical to the architecture design. When interactions themselves pose risk, the system can cancel or reassign deliveries. In many cases, the system gives the Dasher or consumer the option to cancel or reassign without cost or penalty, empowering users to protect themselves while ensuring fairness. This user-controlled action combined with automated detection represents a balanced approach that leverages both AI and human judgment. For asynchronous channels like voice calls where prevention isn't possible, the system can drop calls in progress and prevent further contact between the same parties. This reactive capability is crucial when escalation could put someone at risk, even if the initial harmful content couldn't be prevented. Finally, severe or repeated offenses are escalated to human safety agents who can impose consequences ranging from temporary suspension to permanent platform removal based on severity, recurrence, and context. This human oversight layer ensures that high-stakes enforcement decisions receive appropriate scrutiny. ## Production Performance and Impact The case study provides specific quantitative results that demonstrate SafeChat's production impact. Since launch, the platform has achieved a 50% reduction in low to medium-severity safety incidents. This represents a substantial improvement in platform safety that translates to multiple business benefits: higher Dasher and consumer trust, reduced customer support volume, and stronger platform reliability. The system processes millions of messages daily with 99.8% handled by the efficient first-layer model, demonstrating excellent scaling characteristics. The latency profile (under 300ms for most messages, up to 3 seconds for suspicious content) shows that the system meets real-time requirements without degrading user experience for the vast majority of interactions. From a cost optimization perspective, the evolution from three external LLM layers to a two-layer architecture with an internal model represents significant operational savings. By handling 99.8% of traffic with the internal model at no marginal cost (beyond infrastructure), DoorDash reserves expensive external LLM calls for only the 0.2% of messages that truly require that level of analysis. ## LLMOps Lessons and Considerations This case study illustrates several important LLMOps principles and practices that are worth examining critically. The model cascade architecture demonstrates how to optimize the accuracy-cost-latency tradeoff by routing requests to appropriately sized models. However, this approach adds architectural complexity with multiple models to maintain, version, and monitor. Organizations must weigh whether the cost savings justify this additional operational burden. The evolution from external APIs to internal models shows the value of using production systems to generate training data for custom models. DoorDash's 10 million labeled messages provided sufficient data to train an effective internal model. However, the case study doesn't discuss the ongoing maintenance burden of this internal model, including needs for retraining as language evolves, handling dataset drift, or addressing emerging attack patterns. These are real concerns for production LLM systems that require continuous investment. The phased rollout approach for voice moderation, starting with passive monitoring and manual review before enabling automated actions, represents excellent risk management. This pattern should be standard practice for high-stakes AI applications, though it requires patience and resources that some organizations might be tempted to skip. The granular scoring approach (0-5 across three categories for text, subcategories with confidence scores for images) enables more nuanced decision-making than simple binary classification. However, this granularity also means more thresholds to tune and more decision logic to maintain. The case study mentions "multiple rounds" of threshold optimization but doesn't detail the ongoing monitoring and adjustment processes that are surely required in production. One notable gap in the case study is discussion of adversarial resistance. Content moderation systems face adversaries who actively try to bypass filters using techniques like character substitution, obfuscation, or context-dependent language. The document doesn't address how SafeChat handles these challenges or whether the internal model requires regular updates to address new evasion techniques. The case study also doesn't discuss false negative rates or the user impact when legitimate messages are incorrectly blocked. While the team clearly worked to reduce false positives in image moderation, the full picture of model performance across precision and recall isn't provided. In production content moderation systems, both types of errors have real costs: false positives frustrate users with legitimate needs, while false negatives allow harmful content through. The human escalation layer for severe or repeated offenses is mentioned but not detailed. In practice, this human-in-the-loop component is critical for fairness and accuracy in high-stakes decisions, but it also represents a scaling challenge and requires clear guidelines, training, and oversight to function effectively. Overall, SafeChat represents a sophisticated production LLM system that successfully balances multiple competing objectives: safety, user experience, cost, and scale. The architectural evolution demonstrates learning and optimization based on production data, and the multi-modal approach (text, image, voice) shows comprehensive thinking about platform safety. However, as with any case study from a company blog, readers should recognize that this presents the success story without deep discussion of ongoing challenges, failures, or limitations. The 50% reduction in incidents is impressive, but we don't know the baseline rate, whether reduction has plateaued, or what challenges remain. These are typical limitations of company-published case studies that should be kept in mind when evaluating the approach for adoption elsewhere.

Start deploying reproducible AI workflows today