ZenML

Migrating from Elasticsearch to Vespa for Large-Scale Search Platform

Vinted 2024
View original source

Vinted, a major e-commerce platform, successfully migrated their search infrastructure from Elasticsearch to Vespa to handle their growing scale of 1 billion searchable items. The migration resulted in halving their server count, improving search latency by 2.5x, reducing indexing latency by 3x, and decreasing visibility time for changes from 300 to 5 seconds. The project, completed between May 2023 and April 2024, demonstrated significant improvements in search relevance and operational efficiency through careful architectural planning and phased implementation.

Industry

E-commerce

Technologies

Overview

Vinted is a European second-hand marketplace that needed to scale its search infrastructure to handle approximately 1 billion active searchable items while maintaining low latency and high relevance. This case study documents their migration from Elasticsearch to Vespa, an open-source search engine and vector database originally built by Yahoo!. While this is primarily a search infrastructure case study rather than a pure LLMOps story, it has significant relevance to LLMOps practitioners due to Vespa’s integrated machine learning model inference capabilities, vector search support, and the patterns used for deploying ML-enhanced search at scale.

The migration began in May 2023, with item search traffic fully switched to Vespa by November 2023, and facet traffic migrated by April 2024. The team chose Vespa specifically because it supports vector search, lexical search, and structured data queries in a single system, with integrated machine-learned model inference for real-time AI applications.

The Problem: Elasticsearch Limitations at Scale

Vinted had been using Elasticsearch since May 2015 (migrating from Sphinx). As the platform grew, they encountered several limitations with their Elasticsearch setup:

The Solution: Vespa Architecture and Implementation

The team formed a dedicated Search Platform team of four Search Engineers with diverse backgrounds and shared expertise in search technologies. They divided the project into five key areas: architecture, infrastructure, indexing, querying, and metrics/performance testing.

Architecture Decisions

Vespa’s architecture provided several advantages over Elasticsearch. The team applied Little’s and Amdahl’s laws to optimize the parts of the system that most impact overall performance. Key architectural benefits included:

Infrastructure Transformation

The new infrastructure consists of:

The Vespa Application Package (VAP) deployment model encapsulates the entire application model into a single package, including schema definitions, ranking configurations, and content node specifications.

Real-Time Indexing Pipeline

One of the most significant improvements was in the indexing architecture. The team built a Search Indexing Pipeline (SIP) on top of Apache Flink, integrated with Vespa through Vespa Kafka Connect—a connector they open-sourced as no Vespa sink was previously available.

The indexing performance achieved:

The case study emphasizes that in modern search systems, indexing latency directly affects the lead time of feature development and the pace of search performance experimentation—a principle that applies equally to ML/LLM feature deployment.

Querying and ML Integration

Vespa’s querying capabilities enable what Vinted calls their “triangle of search”: combining traditional lexical search, modern vector search, and structured data queries in single requests. This hybrid approach is crucial for relevance in e-commerce search.

Key querying implementation details:

The team contributed Lucene text analysis component integration to upstream Vespa, allowing them to retain language analyzers from Elasticsearch while benefiting from Vespa’s scalability. This is notable as it demonstrates contributing back to open source while solving production needs.

Migration Strategy and Testing

The migration followed a careful, risk-mitigated approach:

Monitoring and Observability

Vespa’s built-in Prometheus metrics system provides detailed insights into:

The ability to test changes in inactive regions before they impact users is a pattern valuable for any ML/LLM system deployment.

Results and Business Impact

The migration delivered significant quantifiable improvements:

Relevance to LLMOps

While this case study focuses on search infrastructure migration rather than LLM deployment specifically, several aspects are highly relevant to LLMOps practitioners:

Vinted now runs 21 unique Vespa deployments across diverse use cases including item search, image retrieval, and search suggestions, with plans to fully transition remaining Elasticsearch features by end of 2024. This consolidation under a single platform that supports both traditional search and vector/ML capabilities positions them well for future LLM-enhanced search features.

More Like This

Large-Scale Personalization and Product Knowledge Graph Enhancement Through LLM Integration

DoorDash 2025

DoorDash faced challenges in scaling personalization and maintaining product catalogs as they expanded beyond restaurants into new verticals like grocery, retail, and convenience stores, dealing with millions of SKUs and cold-start scenarios for new customers and products. They implemented a layered approach combining traditional machine learning with fine-tuned LLMs, RAG systems, and LLM agents to automate product knowledge graph construction, enable contextual personalization, and provide recommendations even without historical user interaction data. The solution resulted in faster, more cost-effective catalog processing, improved personalization for cold-start scenarios, and the foundation for future agentic shopping experiences that can adapt to real-time contexts like emergency situations.

customer_support question_answering classification +64

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Evolving ML Infrastructure for Production Systems: From Traditional ML to LLMs

Doordash 2025

A comprehensive overview of ML infrastructure evolution and LLMOps practices at major tech companies, focusing on Doordash's approach to integrating LLMs alongside traditional ML systems. The discussion covers how ML infrastructure needs to adapt for LLMs, the importance of maintaining guard rails, and strategies for managing errors and hallucinations in production systems, while balancing the trade-offs between traditional ML models and LLMs in production environments.

question_answering classification structured_output +37