ZenML

Large-Scale Personalization System Using LLMs for Buyer Profile Generation

Etsy 2025
View original source

Etsy tackled the challenge of personalizing shopping experiences for nearly 90 million buyers across 100+ million listings by implementing an LLM-based system to generate detailed buyer profiles from browsing and purchasing behaviors. The system analyzes user session data including searches, views, purchases, and favorites to create structured profiles capturing nuanced interests like style preferences and shopping missions. Through significant optimization efforts including data source improvements, token reduction, batch processing, and parallel execution, Etsy reduced profile generation time from 21 days to 3 days for 10 million users while cutting costs by 94% per million users, enabling economically viable large-scale personalization for search query rewriting and refinement pills.

Industry

E-commerce

Technologies

Overview

Etsy’s implementation of LLM-powered buyer profile generation represents a compelling case study in scaling personalization systems for e-commerce platforms. The company faced the fundamental challenge of helping nearly 90 million buyers discover relevant items among over 100 million listings, where traditional search and recommendation systems struggled to capture the nuanced, aesthetic preferences that make each buyer unique. Their solution involved deploying large language models in production to analyze user behavioral data and generate structured buyer profiles that capture both categorical interests and specific shopping missions.

Technical Architecture and Implementation

The system’s technical foundation demonstrates thoughtful LLMOps practices, beginning with a comprehensive data pipeline that retrieves user activity data from multiple internal sources including Etsy’s feature store and BigQuery. The data includes recent searches, item views, purchases, and favorites, which are then processed through carefully engineered prompts to generate structured buyer profiles. The profiles follow a specific data structure that includes categorical interests, confidence scores, explanations, and observed behaviors, with built-in logic to handle cases where insufficient data exists to make confident inferences.

What’s particularly noteworthy from an LLMOps perspective is how Etsy approached the challenge of scaling this system to handle their massive user base. Initially, the naive implementation would have required weeks to process all users and been prohibitively expensive. The engineering team implemented several critical optimizations that showcase mature LLMOps thinking. They shifted from API-based data retrieval to optimized BigQuery tables with proper clustering and partitioning, demonstrating the importance of data infrastructure in LLM systems. The reduction of input context from two years to nine months of session data shows pragmatic prompt engineering that balances context richness with computational efficiency while also addressing seasonal bias in user behavior.

Production Optimization and Cost Management

The cost optimization achievements are particularly impressive from an operational standpoint. Etsy reduced their estimated processing costs by 94% per million users through a combination of strategies including model selection (moving to smaller, more efficient models while maintaining quality), batch size optimization, and sophisticated prompt engineering. This cost reduction was crucial for making the system economically viable at scale, highlighting how cost considerations must be central to LLMOps planning for large-scale deployments.

The system’s orchestration using Apache Airflow demonstrates proper workflow management for LLM systems in production. The implementation includes batching strategies, staggered processing by user ID, and careful management of API rate limits for both BigQuery and OpenAI APIs. The directed acyclic graph (DAG) approach allows for parallel processing while respecting system constraints, showing how traditional data engineering practices integrate with LLM operations.

Privacy and Compliance Considerations

An important aspect of this implementation is Etsy’s attention to privacy compliance, building the system in alignment with GDPR and CCPA/CPRA requirements with user opt-out capabilities. This demonstrates how production LLM systems must integrate privacy considerations from the ground up rather than as an afterthought, which is increasingly critical for consumer-facing applications processing personal behavioral data.

Real-World Applications and Performance Measurement

The system’s practical applications in query rewriting and refinement pills showcase how LLM-generated insights can be operationalized to improve user experience. Query rewriting transforms simple searches like “cool posters” into enriched queries with predicted style preferences, while refinement pills provide interactive filtering options based on buyer profiles. These applications demonstrate the system’s ability to translate abstract user understanding into concrete interface improvements.

The measurement framework Etsy describes shows mature thinking about LLM system evaluation in production. Rather than relying solely on offline metrics, they focus on business-relevant measurements including click-through rate lifts, conversion rate impacts, and user engagement with personalized features. This approach recognizes that LLM system success must ultimately be measured by downstream business outcomes rather than just model performance metrics.

Challenges and Limitations

While the case study presents impressive results, there are several areas where a balanced assessment reveals potential challenges. The system’s dependence on historical behavioral data creates inherent limitations for new users, which Etsy acknowledges through their exploration of “inheritance profiles” using collaborative filtering. This cold-start problem is common in personalization systems but becomes more complex when LLMs are involved due to the structured nature of the profiles and the need for sufficient context.

The profile refresh strategy, while thoughtfully designed with dynamic timing based on user activity levels, introduces operational complexity around when and how to update profiles. The system must balance freshness with computational costs while detecting interest drift and seasonal variations. This ongoing maintenance requirement represents a significant operational overhead that organizations should consider when implementing similar systems.

Scalability and Infrastructure Requirements

The infrastructure requirements for this system are substantial, involving coordination between multiple data sources, LLM APIs, and downstream applications. The 94% cost reduction still implies significant ongoing expenses for a user base of nearly 90 million, and the three-day processing time for 10 million users suggests that maintaining fresh profiles across the entire user base requires considerable computational resources and careful scheduling.

The system’s reliance on external LLM APIs (OpenAI) introduces dependency risks and potential latency issues that could affect the user experience. While the batch processing approach mitigates some of these concerns, the system’s architecture creates tight coupling between Etsy’s personalization capabilities and external service availability.

Future Directions and Lessons Learned

Etsy’s exploration of collaborative filtering for new users indicates recognition of the system’s current limitations and suggests potential paths for improvement. However, this approach introduces additional complexity around profile matching and inheritance logic that could complicate the already sophisticated system.

The case study demonstrates several key lessons for LLMOps practitioners: the critical importance of cost optimization in large-scale deployments, the value of iterative prompt engineering and data pipeline optimization, and the need for business-relevant measurement frameworks. The success in reducing processing time and costs while maintaining quality shows that significant optimization is possible through systematic engineering effort.

However, organizations considering similar implementations should carefully evaluate the substantial infrastructure requirements, ongoing operational costs, and complexity of maintaining such systems at scale. The impressive results Etsy achieved required significant engineering investment and operational sophistication that may not be feasible for all organizations. The system represents a mature approach to production LLM deployment but also illustrates the substantial resources required to implement personalization at this scale effectively.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Large-Scale Personalization and Product Knowledge Graph Enhancement Through LLM Integration

DoorDash 2025

DoorDash faced challenges in scaling personalization and maintaining product catalogs as they expanded beyond restaurants into new verticals like grocery, retail, and convenience stores, dealing with millions of SKUs and cold-start scenarios for new customers and products. They implemented a layered approach combining traditional machine learning with fine-tuned LLMs, RAG systems, and LLM agents to automate product knowledge graph construction, enable contextual personalization, and provide recommendations even without historical user interaction data. The solution resulted in faster, more cost-effective catalog processing, improved personalization for cold-start scenarios, and the foundation for future agentic shopping experiences that can adapt to real-time contexts like emergency situations.

customer_support question_answering classification +64

Building an Enterprise-Grade AI Agent for Recruiting at Scale

LinkedIn 2025

LinkedIn developed Hiring Assistant, an AI agent designed to transform the recruiting workflow by automating repetitive tasks like candidate sourcing, evaluation, and engagement across 1.2+ billion profiles. The system addresses the challenge of recruiters spending excessive time on pattern-recognition tasks rather than high-value decision-making and relationship building. Using a plan-and-execute agent architecture with specialized sub-agents for intake, sourcing, evaluation, outreach, screening, and learning, Hiring Assistant combines real-time conversational interfaces with large-scale asynchronous execution. The solution leverages LinkedIn's Economic Graph for talent insights, custom fine-tuned LLMs for candidate evaluation, and cognitive memory systems that learn from recruiter behavior over time. The result is a globally available agentic product that enables recruiters to work with greater speed, scale, and intelligence while maintaining human-in-the-loop control for critical decisions.

healthcare customer_support question_answering +51