ZenML

Large-Scale Legal RAG Implementation with Multimodal Data Infrastructure

Harvey / Lance 2025
View original source

Harvey, a legal AI assistant company, partnered with LanceDB to address complex retrieval-augmented generation (RAG) challenges across massive datasets of legal documents. The case study demonstrates how they built a scalable system to handle diverse legal queries ranging from small on-demand uploads to large data corpuses containing millions of documents from various jurisdictions. Their solution combines advanced vector search capabilities with a multimodal lakehouse architecture, emphasizing evaluation-driven development and flexible infrastructure to support the complex, domain-specific nature of legal AI applications.

Industry

Legal

Technologies

Case Study Overview

This case study presents a comprehensive look at how Harvey, a legal AI assistant company, partnered with LanceDB to tackle some of the most challenging aspects of deploying large language models in production for legal applications. Harvey sells AI products to law firms to help with various legal tasks including document drafting, analysis, and complex legal workflows. The partnership showcases a real-world implementation of advanced RAG systems operating at massive scale with stringent requirements for accuracy, security, and performance.

The collaboration between Harvey and LanceDB represents a sophisticated approach to LLMOps in a highly specialized domain where accuracy is paramount and the data presents unique challenges. Calvin from Harvey leads teams working on complex RAG problems across massive datasets of legal documents, while Chong from LanceDB brings 20 years of experience in data tools and machine learning infrastructure, having co-authored the pandas library.

Technical Architecture and Scale

Harvey operates at three distinct scales of data handling, each presenting unique LLMOps challenges. The smallest scale involves an assistant product handling on-demand uploads in the 1-50 document range, similar to consumer AI assistants. The medium scale encompasses “vaults” for larger project contexts, such as major deals or data rooms containing all contracts, litigation documents, and emails for specific cases. The largest scale involves data corpuses that serve as knowledge bases containing legislation, case law, taxes, and regulations for entire countries - datasets that can contain tens of millions of documents.

The system architecture demonstrates sophisticated LLMOps practices through its multimodal lakehouse design. LanceDB provides what they term an “AI native multimodal lakehouse” that goes beyond traditional vector databases. This architecture enables storing images, videos, audio, embeddings, text data, and tabular data in a single table, serving as a unified source of truth for search, analytics, training, and preprocessing workloads. The system supports GPU indexing capabilities that can handle billions of vectors, with their record being three to four billion vectors indexed in under two to three hours.

Query Complexity and Domain Challenges

The case study reveals the extraordinary complexity of legal queries that the system must handle. A typical query example demonstrates multiple layers of complexity: “What is the applicable regime to covered bonds issued before 9 July 2022 under the directive EU 2019 2062 and article 129 of the CRR?” This single query incorporates semantic search requirements, implicit temporal filtering, references to specialized legal datasets, keyword matching for specific regulation IDs, multi-part regulatory cross-references, and domain-specific jargon and abbreviations.

These complex queries require the system to break down requests and apply appropriate technologies for different components. The system must handle both sparse and dense retrieval patterns, manage domain-specific terminology, and maintain accuracy across multiple jurisdictions and legal frameworks. This represents a significant LLMOps challenge where traditional search approaches would be insufficient, requiring sophisticated orchestration of multiple AI components.

Evaluation-Driven Development

Harvey’s approach to LLMOps emphasizes evaluation-driven development as a core principle, spending significant engineering effort on validation rather than just algorithmic development. They implement a multi-tiered evaluation strategy that balances fidelity with iteration speed. At the highest fidelity level, they employ expert reviews where legal professionals directly assess outputs and provide detailed analytical reports. This approach is expensive but provides the highest quality feedback for system improvement.

The middle tier involves expert-labeled evaluation criteria that can be assessed synthetically or through automated methods. While still expensive to curate and somewhat costly to run, this approach provides more tractable evaluation at scale. The fastest iteration tier employs automated quantitative metrics including retrieval precision and recall, along with deterministic success criteria such as document folder accuracy, section correctness, and keyword matching validation.

This comprehensive evaluation framework represents sophisticated LLMOps practices that recognize the critical importance of validation in high-stakes legal applications. The investment in evaluation infrastructure enables rapid iteration while maintaining quality standards essential for legal AI applications.

Data Processing and Infrastructure Challenges

The system handles massive, complex datasets across multiple jurisdictions, each requiring specialized processing approaches. Data integration involves working with domain experts to understand the structure and requirements of legal documents from different countries and legal systems. The team applies both manual expert guidance and automated processing techniques, including LLM-based categorization and heuristic approaches for organizing and filtering complex legal data.

Performance requirements span both online and offline contexts. Online performance demands low-latency querying across massive document collections, while offline performance requirements include efficient ingestion, reingestion, and ML experimentation capabilities. Each document corpus can contain tens of millions of large, complex documents, creating significant infrastructure challenges that require careful optimization and scaling strategies.

Security and Privacy Considerations

Legal AI applications present unique LLMOps challenges around data security and privacy. Harvey handles highly sensitive data including confidential deals, IPO documents, and financial filings that require strict segregation and retention policies. The system must support flexible data privacy controls, including customer-specific storage segregation and legally mandated retention periods for different document types.

The infrastructure requirements extend beyond basic security to include comprehensive telemetry and usage monitoring, essential for both performance optimization and compliance requirements. This represents a sophisticated approach to LLMOps where security and privacy considerations are integrated into the core architecture rather than added as afterthoughts.

Technical Innovation and Infrastructure

The partnership leverages LanceDB’s innovative Lance format, designed specifically for AI workloads. This format addresses limitations in traditional data storage approaches like Parquet and Iceberg when handling multimodal AI data. Lance format provides fast random access for search and shuffle operations, maintains fast scan capabilities for analytics and training, and uniquely handles mixed large blob data with small scalar data efficiently.

The architecture implements compute and storage separation, enabling massive scale serving from cloud object storage while maintaining cost efficiency. The system provides sophisticated retrieval capabilities through simple APIs that combine multiple vector columns, vector and full-text search, and re-ranking operations. This technical approach represents advanced LLMOps practices that recognize the unique requirements of AI workloads compared to traditional database applications.

Lessons and Best Practices

The case study emphasizes several key principles for successful LLMOps implementation in specialized domains. First, domain-specific challenges require creative solutions that go beyond generic approaches, necessitating deep collaboration with domain experts and immersion in the specific use case requirements. Understanding data structure, use cases, and both explicit and implicit query patterns becomes crucial for system design.

Second, building for iteration speed and flexibility emerges as a critical success factor. The rapidly evolving nature of AI technology requires systems that can adapt to new tools, paradigms, and model capabilities. Grounding this flexibility in comprehensive evaluation frameworks enables teams to iterate quickly while maintaining quality standards.

Finally, the case study demonstrates that modern AI applications require infrastructure that recognizes the unique characteristics of multimodal data, the prevalence of vector and embedding workloads, and the need for systems that can scale to handle ever-increasing data volumes. The successful implementation of these principles in a high-stakes legal environment provides valuable insights for LLMOps practitioners across industries.

This partnership between Harvey and LanceDB represents a sophisticated example of LLMOps implementation that addresses real-world challenges in deploying AI systems at scale in a domain where accuracy, security, and performance are all critical requirements. The technical solutions and operational practices demonstrated provide a valuable reference for teams tackling similar challenges in specialized AI applications.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Large-Scale Personalization and Product Knowledge Graph Enhancement Through LLM Integration

DoorDash 2025

DoorDash faced challenges in scaling personalization and maintaining product catalogs as they expanded beyond restaurants into new verticals like grocery, retail, and convenience stores, dealing with millions of SKUs and cold-start scenarios for new customers and products. They implemented a layered approach combining traditional machine learning with fine-tuned LLMs, RAG systems, and LLM agents to automate product knowledge graph construction, enable contextual personalization, and provide recommendations even without historical user interaction data. The solution resulted in faster, more cost-effective catalog processing, improved personalization for cold-start scenarios, and the foundation for future agentic shopping experiences that can adapt to real-time contexts like emergency situations.

customer_support question_answering classification +64

Multi-Agent Financial Research and Question Answering System

Yahoo! Finance 2025

Yahoo! Finance built a production-scale financial question answering system using multi-agent architecture to address the information asymmetry between retail and institutional investors. The system leverages Amazon Bedrock Agent Core and employs a supervisor-subagent pattern where specialized agents handle structured data (stock prices, financials), unstructured data (SEC filings, news), and various APIs. The solution processes heterogeneous financial data from multiple sources, handles temporal complexities of fiscal years, and maintains context across sessions. Through a hybrid evaluation approach combining human and AI judges, the system achieves strong accuracy and coverage metrics while processing queries in 5-50 seconds at costs of 2-5 cents per query, demonstrating production viability at scale with support for 100+ concurrent users.

question_answering data_analysis chatbot +49