Company
DoorDash
Title
LLM-Generated Entity Profiles for Personalized Food Delivery Platform
Industry
Tech
Year
2025
Summary (short)
DoorDash evolved from traditional numerical embeddings to LLM-generated natural language profiles for representing consumers, merchants, and food items to improve personalization and explainability. The company built an automated system that generates detailed, human-readable profiles by feeding structured data (order history, reviews, menu metadata) through carefully engineered prompts to LLMs, enabling transparent recommendations, editable user preferences, and richer input for downstream ML models. While the approach offers scalability and interpretability advantages over traditional embeddings, the implementation requires careful evaluation frameworks, robust serving infrastructure, and continuous iteration cycles to maintain profile quality in production.
## Overview DoorDash's LLM-powered profile generation represents a significant shift in how the food delivery platform approaches entity representation for personalization. The company traditionally relied on dense numerical embeddings generated by deep neural networks to represent consumers, merchants, and food items within their search and recommendation systems. While computationally efficient and compact, these embeddings suffered from opacity and interpretability challenges that limited their utility for user-facing features and debugging scenarios. The introduction of LLM-generated natural language profiles addresses these limitations by creating rich, narrative-style descriptions that preserve semantic nuance while remaining fully interpretable by humans. This approach enables DoorDash to build more transparent recommendation systems, allow for editable user preferences in plain English, facilitate rapid feature prototyping by non-technical teams, and generate interpretable embeddings for traditional machine learning models. ## Technical Implementation Architecture DoorDash's profile generation system encompasses three core entity types, each requiring specialized data inputs and processing approaches. Consumer profiles provide comprehensive views of food preferences, dietary restrictions, and ordering patterns broken down by temporal factors like day of week and time of day. Merchant profiles capture primary offerings, cuisine types, service quality metrics, signature items, and economic positioning. Item profiles deliver detailed descriptions including ingredients, taste characteristics, and dietary classifications. The technical pipeline follows a structured five-step process beginning with schema definition to ensure consistency across entity types. The system generates JSON-formatted profiles that can evolve flexibly to meet future product requirements while maintaining essential profile facets. Input data preparation and aggregation represents the foundational stage, requiring extensive data gathering, cleaning, and structuring. For consumer profiles, this includes detailed order histories with item and restaurant metadata, pricing information, food tags, and temporal ordering patterns. Merchant data encompasses sales histories, menu metadata including categories and descriptions, customer ratings, and review content. Item profiles leverage available images, descriptions, and customization options. The prompt engineering phase represents where the system's "magic" occurs, as carefully crafted prompts enable LLMs to synthesize disparate data points into coherent, insightful entity descriptions. Model selection involves offline comparison of different LLM providers and architectures, with DoorDash finding that reasoning models with medium reasoning effort significantly improve nuance capture compared to non-reasoning alternatives. Post-processing appends additional fields and attributes to accommodate data that doesn't require LLM inference, optimizing the balance between automated generation and traditional data processing. ## LLM Usage Principles and Design Decisions DoorDash established a clear principle for determining when to employ LLMs versus traditional code: "Use code for facts, use LLMs for narrative." This approach ensures speed, accuracy, and cost efficiency by leveraging each tool's strengths appropriately. Traditional code handles factual extraction through SQL queries and scripts for objective, structured data like order frequency calculations, average spending metrics, top-selling item lists, price ranges, ingredient enumerations, and explicit dietary tags. LLMs handle narrative synthesis by interpreting multiple or multimodal signals to generate human-readable insights. Examples include taste profile summarizations like "Loves spicy food, prefers healthy lunches," merchant ambiance descriptions derived from reviews such as "A cozy date-night spot," and flavor characterizations like "A rich, savory broth." This division of labor optimizes both computational resources and output quality while maintaining system reliability. ## Production Applications and Impact The natural language profiles unlock multiple capabilities across DoorDash's platform, serving both as improvements to existing systems and enablers of entirely new product experiences. In personalization with storytelling, the profiles power explainable recommendation experiences that move beyond black-box algorithms to explicitly communicate why specific stores or items match user preferences. Dynamic content generation creates personalized carousels, store descriptions, and item summaries tailored to individual tastes rather than generic presentations. Conversational search experiences benefit from the deep contextual understanding these profiles provide, enabling complex request interpretation like "Find me a healthy, low-carb lunch similar to the salad I ordered last Tuesday." The profiles enhance next-generation models by providing rich input for traditional machine learning through semantic IDs and embeddings, while solving cold-start problems for new merchants and products by bootstrapping profiles from metadata rather than requiring engagement history. Internal teams leverage the profiles for actionable, human-readable insights that accelerate analytics processes. Marketing, strategy, and operations teams can understand consumer segments and merchant characteristics through accessible descriptions, reducing dependence on complex SQL queries for qualitative insights. ## Evaluation and Quality Assurance Framework DoorDash implements a comprehensive evaluation strategy addressing profile quality and accuracy concerns critical for user trust. The offline evaluation component assesses new profile versions against existing ones through model-based evaluation using LLM-as-a-judge systems that automatically score profile quality using source data, generated profiles, and structured rubrics. Task-specific evaluation employs LLMs as judges for downstream application quality assessment, enabling rapid offline iterations. Online evaluation represents the ultimate test through production user experience measurement. This includes explicit feedback mechanisms where profiles are surfaced to consumers for direct input and control, creating closed feedback loops for continuous improvement. Application-specific A/B experiments measure the impact of profile-driven features on key metrics like engagement and conversion rates. ## Serving Infrastructure and Scalability The production deployment relies on two core pipeline components designed for high-performance profile delivery. The low-latency, universal storage layer houses generated profiles in high-performance feature stores or databases ensuring downstream services can fetch profile data within service level agreement requirements. This infrastructure must handle millions of consumer, merchant, and item profiles while maintaining consistent access patterns for real-time personalization features. The versioned serving API provides clear, structured interfaces for engineering teams to retrieve profiles, becoming a fundamental component of DoorDash's personalization infrastructure. API versioning ensures backward compatibility while enabling iterative improvements to profile schemas and content structure. ## Continuous Improvement and Iteration Cycles Recognizing that static profile generation systems quickly become stale, DoorDash focuses on building continuous improvement loops, particularly for consumer profiles that evolve with changing preferences and behaviors. The feedback loop integration channels insights from offline and online evaluations directly into development cycles. For example, A/B test results showing that profiles mentioning specific cuisines drive higher engagement inform prompt updates to favor those styles. Systematic prompt engineering treats prompts as code with version control, testing frameworks, and structured experimentation processes for continuous output refinement. Model fine-tuning development leverages high-quality source data and profile text pairs validated through human evaluation processes, potentially leading to higher accuracy, better format adherence, and reduced inference costs over time. ## Technical Challenges and Considerations While the blog post presents largely positive outcomes, several technical challenges merit consideration in this production LLM deployment. Profile consistency across millions of entities requires robust quality control mechanisms, as LLM-generated content can vary significantly based on input data quality and prompt interpretation. The cost implications of generating and maintaining profiles at DoorDash's scale likely represent substantial computational expenses, particularly when using reasoning models for improved output quality. Latency considerations for real-time personalization features must balance profile freshness with response time requirements, potentially requiring sophisticated caching and update strategies. The system's dependence on prompt engineering introduces brittleness concerns, as model updates or changes in input data patterns could require extensive prompt reengineering to maintain output quality. Data privacy and bias considerations become particularly important when generating detailed consumer profiles, as the system must ensure appropriate handling of sensitive information while avoiding reinforcement of existing biases in food preferences or merchant characterizations. The evaluation framework's reliance on LLM-as-a-judge systems introduces potential circular dependencies where the same types of models evaluate their own outputs. ## Production Readiness and Operational Considerations The transition from experimental profile generation to production-grade serving requires addressing multiple operational challenges. Version management becomes critical as profiles evolve, requiring careful coordination between profile updates and dependent downstream systems. Monitoring and alerting systems must track profile generation success rates, quality metrics, and serving performance to ensure system reliability. The integration with existing recommendation and search systems requires careful consideration of how narrative profiles complement rather than replace traditional embedding-based approaches. The hybrid approach of maintaining both representation types introduces complexity in model training and feature engineering processes that must be managed through careful system design and testing protocols. Overall, DoorDash's approach represents a thoughtful evolution in entity representation that addresses real limitations of traditional embedding-based systems while introducing new complexities inherent in production LLM deployment. The success of this initiative likely depends heavily on the robustness of the evaluation frameworks, the quality of the serving infrastructure, and the team's ability to maintain profile quality at scale through continuous iteration and improvement processes.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.