ZenML

AI-Powered Legal Document Review and Analysis Platform

Lexbe 2025
View original source

Lexbe, a legal document review software company, developed Lexbe Pilot, an AI-powered Q&A assistant integrated into their eDiscovery platform using Amazon Bedrock and associated AWS services. The solution addresses the challenge of legal professionals needing to analyze massive document sets (100,000 to over 1 million documents) to identify critical evidence for litigation. By implementing a RAG-based architecture with Amazon Bedrock Knowledge Bases, the system enables legal teams to query entire datasets and retrieve contextually relevant results that go beyond traditional keyword searches. Through an eight-month collaborative development process with AWS, Lexbe achieved a 90% recall rate with the final implementation, enabling the generation of comprehensive findings-of-fact reports and deep automated inference capabilities that can identify relationships and connections across multilingual document collections.

Industry

Legal

Technologies

Overview

Lexbe is a leader in legal document review software that has been operating since 2006, providing eDiscovery solutions through their cloud-based platform Lexbe Online™. The company developed Lexbe Pilot, an AI-powered Q&A assistant that represents a significant advancement in how legal professionals handle document review and analysis. This case study demonstrates a comprehensive LLMOps implementation that addresses the critical challenge of analyzing massive legal document collections ranging from 100,000 to over one million documents per case.

The core problem Lexbe addressed is fundamentally an LLMOps challenge: how to deploy and scale large language models in production to handle the rigorous demands of legal document analysis where accuracy, reliability, and performance are paramount. Legal professionals face the daunting task of identifying critical evidence within vast document sets under tight deadlines, where missing key information can result in unfavorable case outcomes. Traditional keyword-based search approaches are insufficient for this task, as they often return hundreds or thousands of documents without providing the contextual understanding needed to identify truly relevant information.

Technical Architecture and LLMOps Implementation

Lexbe’s solution represents a sophisticated LLMOps architecture built on Amazon Bedrock and integrated AWS services. The system employs a comprehensive RAG (Retrieval-Augmented Generation) workflow that demonstrates several key LLMOps principles in production deployment.

The architecture begins with document ingestion and processing, where legal documents stored in Amazon S3 undergo text extraction using Apache Tika. This extracted text is then processed through Amazon Bedrock’s embedding models, specifically Titan Text v2, to generate vector representations. The choice of embedding model and configuration was the result of extensive experimentation, with Lexbe testing multiple models including Amazon Titan and Cohere, as well as different token sizes (512 vs 1024 tokens) to optimize performance.

The embedding generation and storage system demonstrates important LLMOps considerations around model selection and parameter tuning. The embeddings are stored in a vector database that enables fast semantic retrieval, while Amazon OpenSearch provides both vector and text-based indexing capabilities. This dual approach allows the system to handle both semantic similarity searches and traditional keyword-based queries, providing flexibility in how legal professionals can interact with the document corpus.

The query processing pipeline showcases production-ready LLM deployment through Amazon Bedrock Knowledge Bases, which provides a fully managed RAG workflow. When users submit queries through the web interface, the system routes requests through Amazon CloudFront and an Application Load Balancer to backend services running on AWS Fargate. This serverless container approach enables horizontal scaling without infrastructure management overhead, a key consideration for handling variable workloads in legal environments.

The LLM component uses Amazon Bedrock’s Sonnet 3.5 model for generating coherent and accurate responses based on retrieved document context. This represents a critical LLMOps decision point where model selection directly impacts output quality and system reliability. The choice of Sonnet 3.5 reflects considerations around reasoning capabilities, context handling, and response generation quality that are essential for legal applications.

Performance Optimization and Production Readiness

One of the most compelling aspects of this case study is the detailed documentation of the iterative improvement process over eight months of development. This demonstrates real-world LLMOps practices around performance monitoring, evaluation, and continuous improvement in production environments.

Lexbe established clear acceptance criteria focused on recall rates, recognizing that in legal document review, missing relevant documents can have serious consequences. The recall rate metric served as their primary production readiness benchmark, which is a sophisticated approach to LLMOps evaluation that goes beyond simple accuracy measures.

The performance evolution tells a story of systematic LLMOps optimization. Starting with only a 5% recall rate in January 2024, the team achieved incremental improvements through various technical interventions. By April 2024, new features in Amazon Bedrock Knowledge Bases brought the recall rate to 36%. Parameter adjustments around token size in June 2024 increased performance to 60%, while model optimization with Titan Embed text-v2 reached 66% by August 2024.

The breakthrough came in December 2024 with the introduction of reranker technology, which pushed the recall rate to 90%. This progression demonstrates important LLMOps principles around systematic evaluation, iterative improvement, and the importance of emerging technologies like rerankers in production RAG systems.

Production Deployment and Scalability

The deployment architecture demonstrates mature LLMOps practices for production systems. AWS Fargate provides containerized deployment with automatic scaling capabilities, allowing the system to handle varying workloads typical in legal environments where case loads can fluctuate significantly. The use of Amazon ECS with Linux Spot Market instances provides cost optimization, which is crucial for production LLMOps deployments where compute costs can be substantial.

The system architecture includes robust security measures essential for legal applications, including encryption and role-based access controls through AWS’s security framework. This addresses one of the key challenges in LLMOps for regulated industries where data privacy and security are paramount.

The scalability features enable processing of document sets ranging from hundreds of thousands to over a million documents, demonstrating the system’s ability to handle enterprise-scale workloads. This scalability is achieved through the combination of serverless compute (Fargate), managed services (Bedrock), and optimized storage and indexing systems.

Advanced Capabilities and Real-World Performance

The production system demonstrates sophisticated LLMOps capabilities that go beyond simple question-answering. Lexbe Pilot can generate comprehensive findings-of-fact reports spanning multiple pages with proper formatting, section headings, and hyperlinked source citations. This represents advanced document generation capabilities that require careful prompt engineering and output formatting in production LLM systems.

Perhaps more impressive is the system’s ability to perform deep automated inference across document collections. The example of identifying family relationships by connecting metadata from emails demonstrates sophisticated reasoning capabilities that require the LLM to synthesize information from multiple sources. This type of inferential reasoning represents advanced LLMOps implementation where the system can discover implicit connections that traditional search methods cannot identify.

The system handles multilingual documents seamlessly, processing content in English, Spanish, and other languages without requiring separate processing pipelines. This multilingual capability is crucial for modern legal practices and demonstrates robust LLMOps implementation that can handle diverse data types in production.

Collaborative Development and LLMOps Best Practices

The eight-month collaboration between Lexbe and the Amazon Bedrock Knowledge Bases team represents exemplary LLMOps development practices. Weekly strategy meetings between senior teams enabled rapid iteration and continuous improvement, demonstrating the importance of close collaboration between LLMOps practitioners and platform providers.

The establishment of clear acceptance criteria and systematic performance measurement throughout the development process reflects mature LLMOps practices. The focus on recall rate as the primary metric, while acknowledging the specific requirements of legal document analysis, shows sophisticated understanding of evaluation in production LLM systems.

This case study represents a successful production LLMOps implementation that addresses real-world business challenges while demonstrating scalable, secure, and cost-effective deployment practices. The system’s ability to transform traditional legal document review processes while maintaining the accuracy and reliability required in legal contexts showcases the potential for LLMOps to drive significant business value in regulated industries.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Migration of Credit AI RAG Application from Multi-Cloud to AWS Bedrock

Octus 2025

Octus, a leading provider of credit market data and analytics, migrated their flagship generative AI product Credit AI from a multi-cloud architecture (OpenAI on Azure and other services on AWS) to a unified AWS architecture using Amazon Bedrock. The migration addressed challenges in scalability, cost, latency, and operational complexity associated with running a production RAG application across multiple clouds. By leveraging Amazon Bedrock's managed services for embeddings, knowledge bases, and LLM inference, along with supporting AWS services like Lambda, S3, OpenSearch, and Textract, Octus achieved a 78% reduction in infrastructure costs, 87% decrease in cost per question, improved document sync times from hours to minutes, and better development velocity while maintaining SOC2 compliance and serving thousands of concurrent users across financial services clients.

document_processing question_answering summarization +45

AI-Powered Conversational Assistant for Streamlined Home Buying Experience

Rocket 2025

Rocket Companies, a Detroit-based FinTech company, developed Rocket AI Agent to address the overwhelming complexity of the home buying process by providing 24/7 personalized guidance and support. Built on Amazon Bedrock Agents, the AI assistant combines domain knowledge, personalized guidance, and actionable capabilities to transform client engagement across Rocket's digital properties. The implementation resulted in a threefold increase in conversion rates from web traffic to closed loans, 85% reduction in transfers to customer care, and 68% customer satisfaction scores, while enabling seamless transitions between AI assistance and human support when needed.

customer_support chatbot question_answering +40