Harvey / Lance: Large-Scale Legal RAG Implementation with Multimodal Data Infrastructure

LLMOps Database

Legal

Harvey / Lance

Company

Harvey / Lance

Title

Large-Scale Legal RAG Implementation with Multimodal Data Infrastructure

Industry

Legal

Link

https://www.youtube.com/watch?v=W1MiZChnkfA

Year

2025

Summary (short)

Harvey, a legal AI assistant company, partnered with LanceDB to address complex retrieval-augmented generation (RAG) challenges across massive datasets of legal documents. The case study demonstrates how they built a scalable system to handle diverse legal queries ranging from small on-demand uploads to large data corpuses containing millions of documents from various jurisdictions. Their solution combines advanced vector search capabilities with a multimodal lakehouse architecture, emphasizing evaluation-driven development and flexible infrastructure to support the complex, domain-specific nature of legal AI applications.

## Case Study Overview This case study presents a comprehensive look at how Harvey, a legal AI assistant company, partnered with LanceDB to tackle some of the most challenging aspects of deploying large language models in production for legal applications. Harvey sells AI products to law firms to help with various legal tasks including document drafting, analysis, and complex legal workflows. The partnership showcases a real-world implementation of advanced RAG systems operating at massive scale with stringent requirements for accuracy, security, and performance. The collaboration between Harvey and LanceDB represents a sophisticated approach to LLMOps in a highly specialized domain where accuracy is paramount and the data presents unique challenges. Calvin from Harvey leads teams working on complex RAG problems across massive datasets of legal documents, while Chong from LanceDB brings 20 years of experience in data tools and machine learning infrastructure, having co-authored the pandas library. ## Technical Architecture and Scale Harvey operates at three distinct scales of data handling, each presenting unique LLMOps challenges. The smallest scale involves an assistant product handling on-demand uploads in the 1-50 document range, similar to consumer AI assistants. The medium scale encompasses "vaults" for larger project contexts, such as major deals or data rooms containing all contracts, litigation documents, and emails for specific cases. The largest scale involves data corpuses that serve as knowledge bases containing legislation, case law, taxes, and regulations for entire countries - datasets that can contain tens of millions of documents. The system architecture demonstrates sophisticated LLMOps practices through its multimodal lakehouse design. LanceDB provides what they term an "AI native multimodal lakehouse" that goes beyond traditional vector databases. This architecture enables storing images, videos, audio, embeddings, text data, and tabular data in a single table, serving as a unified source of truth for search, analytics, training, and preprocessing workloads. The system supports GPU indexing capabilities that can handle billions of vectors, with their record being three to four billion vectors indexed in under two to three hours. ## Query Complexity and Domain Challenges The case study reveals the extraordinary complexity of legal queries that the system must handle. A typical query example demonstrates multiple layers of complexity: "What is the applicable regime to covered bonds issued before 9 July 2022 under the directive EU 2019 2062 and article 129 of the CRR?" This single query incorporates semantic search requirements, implicit temporal filtering, references to specialized legal datasets, keyword matching for specific regulation IDs, multi-part regulatory cross-references, and domain-specific jargon and abbreviations. These complex queries require the system to break down requests and apply appropriate technologies for different components. The system must handle both sparse and dense retrieval patterns, manage domain-specific terminology, and maintain accuracy across multiple jurisdictions and legal frameworks. This represents a significant LLMOps challenge where traditional search approaches would be insufficient, requiring sophisticated orchestration of multiple AI components. ## Evaluation-Driven Development Harvey's approach to LLMOps emphasizes evaluation-driven development as a core principle, spending significant engineering effort on validation rather than just algorithmic development. They implement a multi-tiered evaluation strategy that balances fidelity with iteration speed. At the highest fidelity level, they employ expert reviews where legal professionals directly assess outputs and provide detailed analytical reports. This approach is expensive but provides the highest quality feedback for system improvement. The middle tier involves expert-labeled evaluation criteria that can be assessed synthetically or through automated methods. While still expensive to curate and somewhat costly to run, this approach provides more tractable evaluation at scale. The fastest iteration tier employs automated quantitative metrics including retrieval precision and recall, along with deterministic success criteria such as document folder accuracy, section correctness, and keyword matching validation. This comprehensive evaluation framework represents sophisticated LLMOps practices that recognize the critical importance of validation in high-stakes legal applications. The investment in evaluation infrastructure enables rapid iteration while maintaining quality standards essential for legal AI applications. ## Data Processing and Infrastructure Challenges The system handles massive, complex datasets across multiple jurisdictions, each requiring specialized processing approaches. Data integration involves working with domain experts to understand the structure and requirements of legal documents from different countries and legal systems. The team applies both manual expert guidance and automated processing techniques, including LLM-based categorization and heuristic approaches for organizing and filtering complex legal data. Performance requirements span both online and offline contexts. Online performance demands low-latency querying across massive document collections, while offline performance requirements include efficient ingestion, reingestion, and ML experimentation capabilities. Each document corpus can contain tens of millions of large, complex documents, creating significant infrastructure challenges that require careful optimization and scaling strategies. ## Security and Privacy Considerations Legal AI applications present unique LLMOps challenges around data security and privacy. Harvey handles highly sensitive data including confidential deals, IPO documents, and financial filings that require strict segregation and retention policies. The system must support flexible data privacy controls, including customer-specific storage segregation and legally mandated retention periods for different document types. The infrastructure requirements extend beyond basic security to include comprehensive telemetry and usage monitoring, essential for both performance optimization and compliance requirements. This represents a sophisticated approach to LLMOps where security and privacy considerations are integrated into the core architecture rather than added as afterthoughts. ## Technical Innovation and Infrastructure The partnership leverages LanceDB's innovative Lance format, designed specifically for AI workloads. This format addresses limitations in traditional data storage approaches like Parquet and Iceberg when handling multimodal AI data. Lance format provides fast random access for search and shuffle operations, maintains fast scan capabilities for analytics and training, and uniquely handles mixed large blob data with small scalar data efficiently. The architecture implements compute and storage separation, enabling massive scale serving from cloud object storage while maintaining cost efficiency. The system provides sophisticated retrieval capabilities through simple APIs that combine multiple vector columns, vector and full-text search, and re-ranking operations. This technical approach represents advanced LLMOps practices that recognize the unique requirements of AI workloads compared to traditional database applications. ## Lessons and Best Practices The case study emphasizes several key principles for successful LLMOps implementation in specialized domains. First, domain-specific challenges require creative solutions that go beyond generic approaches, necessitating deep collaboration with domain experts and immersion in the specific use case requirements. Understanding data structure, use cases, and both explicit and implicit query patterns becomes crucial for system design. Second, building for iteration speed and flexibility emerges as a critical success factor. The rapidly evolving nature of AI technology requires systems that can adapt to new tools, paradigms, and model capabilities. Grounding this flexibility in comprehensive evaluation frameworks enables teams to iterate quickly while maintaining quality standards. Finally, the case study demonstrates that modern AI applications require infrastructure that recognizes the unique characteristics of multimodal data, the prevalence of vector and embedding workloads, and the need for systems that can scale to handle ever-increasing data volumes. The successful implementation of these principles in a high-stakes legal environment provides valuable insights for LLMOps practitioners across industries. This partnership between Harvey and LanceDB represents a sophisticated example of LLMOps implementation that addresses real-world challenges in deploying AI systems at scale in a domain where accuracy, security, and performance are all critical requirements. The technical solutions and operational practices demonstrated provide a valuable reference for teams tackling similar challenges in specialized AI applications.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source