Farfetch implemented a scalable recommender system using Vespa as a vector database to serve real-time personalized recommendations across multiple online retailers. The system processes user-product interactions and features through matrix operations to generate recommendations, achieving sub-100ms latency requirements while maintaining scalability. The solution cleverly handles sparse matrices and shape mismatching challenges through optimized data storage and computation strategies.
This case study entry represents an unavailable resource from Farfetch’s tech blog. The original article, titled “Scaling Recommenders Systems with Vespa,” has been deleted by its author and returns a 410 Gone HTTP error. Farfetch is a well-known luxury fashion e-commerce platform that connects consumers with boutiques and brands worldwide, making recommendation systems a critical component of their technology stack.
Based solely on the URL and the partial title that remains visible, this article appeared to discuss how Farfetch leveraged Vespa, an open-source big data serving engine developed by Yahoo (now Verizon Media), to scale their recommendation systems. Vespa is commonly used for applications requiring real-time computation over large datasets, including search, recommendation, and personalization use cases.
In the e-commerce domain, recommendation systems are essential for improving customer engagement, increasing conversion rates, and enhancing the overall shopping experience. For a luxury fashion platform like Farfetch, which deals with a vast catalog of products from multiple boutiques and brands, having a scalable and efficient recommendation infrastructure would be particularly important.
It is crucial to note that this case study entry is severely limited due to the unavailability of the original content. The following aspects cannot be determined or verified:
While we cannot speak to Farfetch’s specific implementation, Vespa is a popular choice for scaling recommendation systems because it provides:
Many e-commerce companies have adopted Vespa or similar technologies (like Elasticsearch with vector search capabilities, or purpose-built vector databases) to power their recommendation infrastructure, especially as the industry has moved toward embedding-based and semantic recommendation approaches.
Without the original content, it is difficult to assess whether this case study had direct relevance to LLMOps. However, modern recommendation systems increasingly incorporate embedding-based approaches that may leverage language models for:
If Farfetch’s implementation involved any of these capabilities, it would have been relevant to the broader LLMOps domain. Vespa’s support for vector similarity search makes it a viable platform for serving recommendations based on embeddings generated by language models.
This entry serves primarily as a placeholder and acknowledgment that valuable technical content from Farfetch’s engineering team on scaling recommendation systems existed but is no longer publicly available. For practitioners interested in this topic, alternative resources on Vespa for recommendations or Farfetch’s other published engineering content may provide relevant insights. The deletion of this content is unfortunate as case studies from major e-commerce platforms provide valuable learning opportunities for the broader engineering community.
Given the lack of substantive content, readers should seek out other published materials on recommendation system architecture, Vespa implementation guides, or Farfetch’s remaining technical blog posts for practical guidance on scaling recommendation systems in e-commerce environments.
DoorDash faced the challenge of personalizing experiences across a massive, diverse catalog spanning restaurants, grocery, retail, and other local commerce categories for millions of users with rapidly shifting intents. Traditional collaborative filtering and deep learning approaches could not adapt quickly enough to short-lived, high-context moments like Black Friday or individual life events. DoorDash developed a hybrid architecture that leverages LLMs for product understanding, consumer profile generation in natural language, and content blueprint creation, while maintaining traditional deep learning models for efficient last-mile ranking and retrieval. This approach enables the platform to serve dynamic, moment-aware personalization that adapts to real-time user intent while managing latency and cost constraints. The system uses GEPA optimization within DSPy for compound AI system tuning, combines offline LLM processing with online signal blending, and evaluates performance through quantitative metrics, LLM-as-judge, and human feedback.
DoorDash faced challenges in scaling personalization and maintaining product catalogs as they expanded beyond restaurants into new verticals like grocery, retail, and convenience stores, dealing with millions of SKUs and cold-start scenarios for new customers and products. They implemented a layered approach combining traditional machine learning with fine-tuned LLMs, RAG systems, and LLM agents to automate product knowledge graph construction, enable contextual personalization, and provide recommendations even without historical user interaction data. The solution resulted in faster, more cost-effective catalog processing, improved personalization for cold-start scenarios, and the foundation for future agentic shopping experiences that can adapt to real-time contexts like emergency situations.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.