ZenML

LLM-Generated Entity Profiles for Personalized Food Delivery Platform

DoorDash 2025
View original source

DoorDash evolved from traditional numerical embeddings to LLM-generated natural language profiles for representing consumers, merchants, and food items to improve personalization and explainability. The company built an automated system that generates detailed, human-readable profiles by feeding structured data (order history, reviews, menu metadata) through carefully engineered prompts to LLMs, enabling transparent recommendations, editable user preferences, and richer input for downstream ML models. While the approach offers scalability and interpretability advantages over traditional embeddings, the implementation requires careful evaluation frameworks, robust serving infrastructure, and continuous iteration cycles to maintain profile quality in production.

Industry

Tech

Technologies

Overview

DoorDash’s LLM-powered profile generation represents a significant shift in how the food delivery platform approaches entity representation for personalization. The company traditionally relied on dense numerical embeddings generated by deep neural networks to represent consumers, merchants, and food items within their search and recommendation systems. While computationally efficient and compact, these embeddings suffered from opacity and interpretability challenges that limited their utility for user-facing features and debugging scenarios.

The introduction of LLM-generated natural language profiles addresses these limitations by creating rich, narrative-style descriptions that preserve semantic nuance while remaining fully interpretable by humans. This approach enables DoorDash to build more transparent recommendation systems, allow for editable user preferences in plain English, facilitate rapid feature prototyping by non-technical teams, and generate interpretable embeddings for traditional machine learning models.

Technical Implementation Architecture

DoorDash’s profile generation system encompasses three core entity types, each requiring specialized data inputs and processing approaches. Consumer profiles provide comprehensive views of food preferences, dietary restrictions, and ordering patterns broken down by temporal factors like day of week and time of day. Merchant profiles capture primary offerings, cuisine types, service quality metrics, signature items, and economic positioning. Item profiles deliver detailed descriptions including ingredients, taste characteristics, and dietary classifications.

The technical pipeline follows a structured five-step process beginning with schema definition to ensure consistency across entity types. The system generates JSON-formatted profiles that can evolve flexibly to meet future product requirements while maintaining essential profile facets. Input data preparation and aggregation represents the foundational stage, requiring extensive data gathering, cleaning, and structuring. For consumer profiles, this includes detailed order histories with item and restaurant metadata, pricing information, food tags, and temporal ordering patterns. Merchant data encompasses sales histories, menu metadata including categories and descriptions, customer ratings, and review content. Item profiles leverage available images, descriptions, and customization options.

The prompt engineering phase represents where the system’s “magic” occurs, as carefully crafted prompts enable LLMs to synthesize disparate data points into coherent, insightful entity descriptions. Model selection involves offline comparison of different LLM providers and architectures, with DoorDash finding that reasoning models with medium reasoning effort significantly improve nuance capture compared to non-reasoning alternatives. Post-processing appends additional fields and attributes to accommodate data that doesn’t require LLM inference, optimizing the balance between automated generation and traditional data processing.

LLM Usage Principles and Design Decisions

DoorDash established a clear principle for determining when to employ LLMs versus traditional code: “Use code for facts, use LLMs for narrative.” This approach ensures speed, accuracy, and cost efficiency by leveraging each tool’s strengths appropriately. Traditional code handles factual extraction through SQL queries and scripts for objective, structured data like order frequency calculations, average spending metrics, top-selling item lists, price ranges, ingredient enumerations, and explicit dietary tags.

LLMs handle narrative synthesis by interpreting multiple or multimodal signals to generate human-readable insights. Examples include taste profile summarizations like “Loves spicy food, prefers healthy lunches,” merchant ambiance descriptions derived from reviews such as “A cozy date-night spot,” and flavor characterizations like “A rich, savory broth.” This division of labor optimizes both computational resources and output quality while maintaining system reliability.

Production Applications and Impact

The natural language profiles unlock multiple capabilities across DoorDash’s platform, serving both as improvements to existing systems and enablers of entirely new product experiences. In personalization with storytelling, the profiles power explainable recommendation experiences that move beyond black-box algorithms to explicitly communicate why specific stores or items match user preferences. Dynamic content generation creates personalized carousels, store descriptions, and item summaries tailored to individual tastes rather than generic presentations.

Conversational search experiences benefit from the deep contextual understanding these profiles provide, enabling complex request interpretation like “Find me a healthy, low-carb lunch similar to the salad I ordered last Tuesday.” The profiles enhance next-generation models by providing rich input for traditional machine learning through semantic IDs and embeddings, while solving cold-start problems for new merchants and products by bootstrapping profiles from metadata rather than requiring engagement history.

Internal teams leverage the profiles for actionable, human-readable insights that accelerate analytics processes. Marketing, strategy, and operations teams can understand consumer segments and merchant characteristics through accessible descriptions, reducing dependence on complex SQL queries for qualitative insights.

Evaluation and Quality Assurance Framework

DoorDash implements a comprehensive evaluation strategy addressing profile quality and accuracy concerns critical for user trust. The offline evaluation component assesses new profile versions against existing ones through model-based evaluation using LLM-as-a-judge systems that automatically score profile quality using source data, generated profiles, and structured rubrics. Task-specific evaluation employs LLMs as judges for downstream application quality assessment, enabling rapid offline iterations.

Online evaluation represents the ultimate test through production user experience measurement. This includes explicit feedback mechanisms where profiles are surfaced to consumers for direct input and control, creating closed feedback loops for continuous improvement. Application-specific A/B experiments measure the impact of profile-driven features on key metrics like engagement and conversion rates.

Serving Infrastructure and Scalability

The production deployment relies on two core pipeline components designed for high-performance profile delivery. The low-latency, universal storage layer houses generated profiles in high-performance feature stores or databases ensuring downstream services can fetch profile data within service level agreement requirements. This infrastructure must handle millions of consumer, merchant, and item profiles while maintaining consistent access patterns for real-time personalization features.

The versioned serving API provides clear, structured interfaces for engineering teams to retrieve profiles, becoming a fundamental component of DoorDash’s personalization infrastructure. API versioning ensures backward compatibility while enabling iterative improvements to profile schemas and content structure.

Continuous Improvement and Iteration Cycles

Recognizing that static profile generation systems quickly become stale, DoorDash focuses on building continuous improvement loops, particularly for consumer profiles that evolve with changing preferences and behaviors. The feedback loop integration channels insights from offline and online evaluations directly into development cycles. For example, A/B test results showing that profiles mentioning specific cuisines drive higher engagement inform prompt updates to favor those styles.

Systematic prompt engineering treats prompts as code with version control, testing frameworks, and structured experimentation processes for continuous output refinement. Model fine-tuning development leverages high-quality source data and profile text pairs validated through human evaluation processes, potentially leading to higher accuracy, better format adherence, and reduced inference costs over time.

Technical Challenges and Considerations

While the blog post presents largely positive outcomes, several technical challenges merit consideration in this production LLM deployment. Profile consistency across millions of entities requires robust quality control mechanisms, as LLM-generated content can vary significantly based on input data quality and prompt interpretation. The cost implications of generating and maintaining profiles at DoorDash’s scale likely represent substantial computational expenses, particularly when using reasoning models for improved output quality.

Latency considerations for real-time personalization features must balance profile freshness with response time requirements, potentially requiring sophisticated caching and update strategies. The system’s dependence on prompt engineering introduces brittleness concerns, as model updates or changes in input data patterns could require extensive prompt reengineering to maintain output quality.

Data privacy and bias considerations become particularly important when generating detailed consumer profiles, as the system must ensure appropriate handling of sensitive information while avoiding reinforcement of existing biases in food preferences or merchant characterizations. The evaluation framework’s reliance on LLM-as-a-judge systems introduces potential circular dependencies where the same types of models evaluate their own outputs.

Production Readiness and Operational Considerations

The transition from experimental profile generation to production-grade serving requires addressing multiple operational challenges. Version management becomes critical as profiles evolve, requiring careful coordination between profile updates and dependent downstream systems. Monitoring and alerting systems must track profile generation success rates, quality metrics, and serving performance to ensure system reliability.

The integration with existing recommendation and search systems requires careful consideration of how narrative profiles complement rather than replace traditional embedding-based approaches. The hybrid approach of maintaining both representation types introduces complexity in model training and feature engineering processes that must be managed through careful system design and testing protocols.

Overall, DoorDash’s approach represents a thoughtful evolution in entity representation that addresses real limitations of traditional embedding-based systems while introducing new complexities inherent in production LLM deployment. The success of this initiative likely depends heavily on the robustness of the evaluation frameworks, the quality of the serving infrastructure, and the team’s ability to maintain profile quality at scale through continuous iteration and improvement processes.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Large-Scale Personalization and Product Knowledge Graph Enhancement Through LLM Integration

DoorDash 2025

DoorDash faced challenges in scaling personalization and maintaining product catalogs as they expanded beyond restaurants into new verticals like grocery, retail, and convenience stores, dealing with millions of SKUs and cold-start scenarios for new customers and products. They implemented a layered approach combining traditional machine learning with fine-tuned LLMs, RAG systems, and LLM agents to automate product knowledge graph construction, enable contextual personalization, and provide recommendations even without historical user interaction data. The solution resulted in faster, more cost-effective catalog processing, improved personalization for cold-start scenarios, and the foundation for future agentic shopping experiences that can adapt to real-time contexts like emergency situations.

customer_support question_answering classification +64

Building an Enterprise-Grade AI Agent for Recruiting at Scale

LinkedIn 2025

LinkedIn developed Hiring Assistant, an AI agent designed to transform the recruiting workflow by automating repetitive tasks like candidate sourcing, evaluation, and engagement across 1.2+ billion profiles. The system addresses the challenge of recruiters spending excessive time on pattern-recognition tasks rather than high-value decision-making and relationship building. Using a plan-and-execute agent architecture with specialized sub-agents for intake, sourcing, evaluation, outreach, screening, and learning, Hiring Assistant combines real-time conversational interfaces with large-scale asynchronous execution. The solution leverages LinkedIn's Economic Graph for talent insights, custom fine-tuned LLMs for candidate evaluation, and cognitive memory systems that learn from recruiter behavior over time. The result is a globally available agentic product that enables recruiters to work with greater speed, scale, and intelligence while maintaining human-in-the-loop control for critical decisions.

healthcare customer_support question_answering +51