Company
Harvey
Title
Document-Wide AI Editing in Microsoft Word Add-In
Industry
Legal
Year
2025
Summary (short)
Harvey developed an AI-powered Word Add-In that enables comprehensive document-wide edits on 100+ page legal documents through a single query. The system addresses the challenges of OOXML complexity by creating reversible mappings between document structure and natural language, while using an orchestrator-subagent architecture to overcome position bias and ensure thorough coverage. The solution transforms hours of manual legal editing into seamless single-query interactions, supporting complex use cases like contract conformance, template creation, and jurisdiction-specific adaptations.
## Overview Harvey's development of document-wide editing capabilities in their Microsoft Word Add-In represents a sophisticated LLMOps implementation that addresses the complex challenges of large-scale document processing in legal environments. The company evolved from supporting targeted local edits to enabling comprehensive modifications of 100+ page documents through a single AI query, transforming what previously required hours of manual effort into seamless interactions. The case study demonstrates advanced technical approaches to production LLM deployment, including novel architectural patterns for handling complex document structures and sophisticated evaluation methodologies. Harvey's solution enables legal professionals to perform complex document-wide operations such as conforming draft agreements to checklists, defining and consistently applying new terms throughout contracts, converting documents into reusable templates, and adapting documents for different jurisdictions or drafting postures. ## Technical Architecture and OOXML Challenges The implementation confronts the inherent complexity of Microsoft Word's Office Open XML (OOXML) format, which stores documents as ZIP containers with interconnected XML parts governing text, formatting, styles, tables, and other objects. Harvey's engineering team recognized that having models directly read and write OOXML creates significant challenges including increased likelihood of poor-quality outcomes, inefficient token usage, and the fundamental mismatch between LLM training on natural language versus XML manipulation. Their architectural solution separates concerns through a reversible mapping system between OOXML objects and natural language representations. The process involves translating OOXML to natural language representation, allowing the model to propose edits over text, then deterministically translating those edits back into precise OOXML mutations that preserve styles and structure. For new content insertions, the model anchors placement relative to existing elements, with the add-in inferring appropriate styling from surrounding context and Word's style parts. This approach represents a sophisticated understanding of production LLM deployment challenges, recognizing that asking models to simultaneously perform legal reasoning and XML parsing leads to regression in both tasks. The solution demonstrates how proper abstraction layers can optimize LLM performance while maintaining the complex requirements of enterprise document processing. ## Orchestrator-Subagent Architecture for Scale Harvey's solution to the "lost in the middle" problem demonstrates advanced understanding of long-context model limitations. Even with modern long-context models, comprehensive document-wide edits on hundreds of pages suffer from position bias, where models over-attend to document beginnings or ends while under-editing middle sections. This results in partial coverage rather than thorough document processing. The orchestrator-subagent architecture addresses these limitations through intelligent work decomposition. An orchestrator model reads the entire document, plans the work, and decomposes requests into targeted tasks operating on bounded chunks. Subagents receive precise, localized instructions and achieve thoroughness by focusing only on document portions within their scope. The orchestrator maintains global consistency by issuing cross-chunk constraints for newly defined terms, tone alignment, style consistency, and cross-reference updates. This pattern represents sophisticated application of established agent and decomposition methods, adapting them specifically for legal document processing requirements. The architecture demonstrates how production LLM systems can achieve both thoroughness and efficiency by separating global planning from local execution, addressing inherent limitations of single-pass processing on large documents. ## Scalable Evaluation Framework Harvey's evaluation methodology represents a mature approach to LLMOps testing that balances automated efficiency with expert validation. Recognizing that human evaluation capacity would become a bottleneck for rapid experimentation, the team developed sophisticated automated evaluation frameworks while maintaining high quality standards through close collaboration between domain experts, product owners, and engineers. The evaluation framework encompasses both quantitative metrics, such as percentage of document elements modified, and qualitative metrics measuring alignment with user requests. Developing automated approaches for both axes required extensive collaboration with legal domain experts to ensure directional signals accurately reflected real-world requirements. Once established, the system could generate comprehensive output evaluations over large input sets in under five minutes. The framework enabled rapid A/B experimentation across different implementation approaches. One concrete example involved representing tracked changes to models using insertion and deletion tags, where automated evaluation confirmed no regression on existing datasets while demonstrating clear improvements on queries referring to document redlines. Over the project lifecycle, Harvey tested over 30 model combinations across major providers, generating tens of thousands of sample outputs and condensing what would have been years of manual evaluation work into weeks. ## Multi-Model Strategy and Production Considerations As a multi-model company, Harvey faced complex decisions about model selection for different roles in their orchestrator-subagent architecture, with each choice involving tradeoffs between latency, cost, and quality. The evaluation framework proved crucial for navigating these decisions systematically rather than through intuition or limited testing. The production deployment demonstrates sophisticated understanding of how different models excel at different tasks within a complex workflow. The orchestrator role requires strong planning and decomposition capabilities, while subagents need focused execution skills for specific document sections. The system's ability to maintain consistency across different model choices while optimizing for performance characteristics shows mature LLMOps practices. ## Legal Domain Specialization Harvey's approach demonstrates deep understanding of legal work requirements and the specific challenges of legal document processing. The supported use cases reflect genuine legal practice needs, from contract conformance and template creation to jurisdiction-specific adaptations and drafting posture switches. The system handles complex legal concepts like defined term consistency and cross-reference management, which require domain-specific intelligence beyond general document editing capabilities. The integration with Harvey's broader platform, including Vault project integration and support for various legal document types from M&A agreements to memoranda and letters, shows how specialized LLM applications can provide comprehensive solutions for professional workflows. The Word Add-In represents one component of a larger AI-powered legal platform, demonstrating how production LLM systems can integrate seamlessly into existing professional tools and workflows. ## Production Impact and Scalability The case study demonstrates significant production impact by transforming complex, hours-long legal editing tasks into single-query interactions. This represents substantial efficiency gains for legal professionals while maintaining the quality and precision requirements of legal work. The system's ability to handle 100+ page documents with comprehensive edits addresses real scalability challenges in legal practice. The technical implementation shows careful consideration of production constraints including performance, reliability, and integration with existing Microsoft Office infrastructure. The solution operates through the Office JavaScript API rather than direct XML manipulation, ensuring compatibility and stability within the Microsoft ecosystem while delivering advanced AI capabilities. Harvey's approach demonstrates how sophisticated LLMOps implementations can deliver transformative value in specialized professional domains through careful architectural design, comprehensive evaluation methodologies, and deep domain understanding. The case study represents a mature example of production LLM deployment that addresses both technical challenges and real-world professional requirements while maintaining high standards for quality and reliability.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.