Harvey: Rapid Integration of Advanced AI Models through Modular Architecture and Workflow Orchestration

LLMOps Database

Legal

Harvey

Company

Harvey

Title

Rapid Integration of Advanced AI Models through Modular Architecture and Workflow Orchestration

Industry

Legal

Link

https://www.harvey.ai/blog/integrating-deep-research-into-harvey

Year

2025

Summary (short)

Harvey, a legal AI platform, demonstrated their ability to rapidly integrate new AI capabilities by incorporating OpenAI's Deep Research feature into their production system within 12 hours of its API release. This achievement was enabled by their AI-native architecture featuring a modular Workflow Engine, composable AI building blocks, transparent "thinking states" for user visibility, and a culture of rapid prototyping using AI-assisted development tools. The case study showcases how purpose-built infrastructure and engineering practices can accelerate the deployment of complex AI features while maintaining enterprise-grade reliability and user transparency in legal workflows.

## Overview Harvey is a legal AI platform that provides domain-specific artificial intelligence tools for legal professionals, including document analysis, research capabilities, and workflow automation. The company has built its platform around the principle of rapidly translating AI breakthroughs into reliable, secure products capable of handling complex legal work. This case study examines Harvey's approach to LLMOps through their rapid integration of OpenAI's Deep Research feature, which they accomplished in less than 12 hours after the API's release. The case demonstrates Harvey's systematic approach to building and maintaining AI systems in production, showcasing how architectural decisions made from day one enable rapid feature deployment while maintaining enterprise-grade reliability. Their success in quickly integrating new AI capabilities stems from two foundational investments: an AI-native architecture and a culture of iterative development supported by AI-assisted coding tools. ## Technical Architecture and LLMOps Infrastructure Harvey's technical architecture is built around what they call an "AI-native" design, purpose-built for complex model systems and agents. This architecture includes several key components that facilitate rapid integration of new AI capabilities while maintaining production stability. ### Workflow Engine The centerpiece of Harvey's LLMOps infrastructure is their Workflow Engine, which serves as a modular and extensible framework for composing model systems and orchestrating agent behavior. This engine underpins both their Workflows product and their newly released Workflow Builder, demonstrating how a well-designed orchestration layer can support multiple product surfaces. The Workflow Engine incorporates several sophisticated features that enable rapid deployment of new AI capabilities. Their approach to AI building blocks allows them to express internally-developed AI primitives as low-code, composable blocks that can be sequenced together by developers or called as tools in agentic systems. This modular approach reduces development time, increases flexibility, and minimizes the risk of introducing regressions in other parts of the system when adding new features. ### Transparency and User Experience A particularly noteworthy aspect of Harvey's LLMOps approach is their implementation of "thinking states" - a system that provides users with visibility into an agent's planning and decision-making processes. This represents a sophisticated approach to AI transparency that goes beyond simple input-output relationships to expose the reasoning chain of AI systems. The thinking states feature allows users to evaluate intermediate results in an easily digestible format, follow along with real-time context about what the models are thinking as they work, and intervene at any step by adding context, tweaking parameters, or rerunning actions. This level of transparency is particularly crucial in legal applications where understanding the reasoning behind AI-generated outputs is essential for professional responsibility and client service. ### Citation and Verification Systems Harvey has invested heavily in citation systems as a means for understanding and interpreting AI answers. Their ability to successfully incorporate URL citations produced by Deep Research into their existing citation system demonstrates the importance of building extensible infrastructure components. Every step of their workflows can be traced and verified, which is essential for trust and verification in legal applications and also streamlines their development process by providing clear audit trails for debugging and optimization. ## AI-Assisted Development Practices The case study provides insight into Harvey's development practices, particularly their use of AI-assisted development tools to accelerate the integration of new capabilities. When OpenAI released the Deep Research API, Harvey's team faced several unique challenges: the feature was brand new with limited documentation, the streaming mode involved long-running operations with over 20 dynamic output types requiring multiple iteration rounds, and they needed to ensure compatibility with their internal frameworks. Their approach involved an engineer facilitating an iteration loop with an AI coding assistant, demonstrating how human-AI collaboration can accelerate development cycles. They employed several effective practices including dumping full logs of streamed API outputs in both Pydantic and JSON formats to provide context for both humans and AI tools, and using Streamlit as a tool for quickly building interactive UIs in Python. This development approach showcases how AI-assisted development can be effectively integrated into production LLMOps workflows, particularly when dealing with rapidly evolving APIs and technologies. The team's ability to understand and integrate a complex new API within hours demonstrates the effectiveness of combining AI assistance with well-structured development practices. ## Model Orchestration and Agent Systems Harvey's platform demonstrates sophisticated model orchestration capabilities through their implementation of agent systems that can intelligently choose and run the right tools at the right moment. This approach to tool use represents a significant advancement in LLMOps, as it enables the creation of unified, collaborative interfaces while supporting autonomous execution of sophisticated workflows under user oversight. The integration of Deep Research into Harvey's existing tool ecosystem illustrates how effective model orchestration can expand AI capabilities toward increasingly complex and reliable end-to-end work. Their approach suggests a future where multiple AI capabilities can be seamlessly woven together into a single workspace, with the model intelligently determining which tools to use based on context and user needs. ## Production Deployment and Scaling Considerations While the case study focuses primarily on the rapid integration capabilities, it also provides insights into Harvey's approach to production deployment and scaling. Their modular architecture appears designed to support continuous integration of new AI features without disrupting existing services, which is crucial for enterprise customers who require high availability and reliability. The company's emphasis on enterprise-grade products suggests they have invested in the infrastructure necessary to support high-volume, mission-critical workloads. Their ability to rapidly integrate new features while maintaining this level of reliability indicates sophisticated deployment and testing practices, though the case study does not provide detailed information about these aspects of their LLMOps implementation. ## Critical Assessment and Limitations While Harvey's case study demonstrates impressive technical capabilities, several aspects warrant careful consideration. The 12-hour integration timeline, while remarkable, may not reflect the full complexity of production deployment including thorough testing, security reviews, and gradual rollout procedures that are typically necessary for enterprise applications. The case study is primarily written as a company blog post and focuses heavily on promoting Harvey's capabilities, which may lead to an overly optimistic portrayal of their achievements. The lack of specific performance metrics, user adoption rates, or detailed technical challenges overcome makes it difficult to fully assess the effectiveness of their approach. Additionally, while the modular architecture and rapid integration capabilities are impressive, the case study does not address important production considerations such as model monitoring, performance optimization, cost management, or handling of edge cases and failures that are crucial aspects of mature LLMOps implementations. The emphasis on rapid integration of new AI features, while valuable for staying competitive, raises questions about the thoroughness of testing and validation processes. In legal applications where accuracy and reliability are paramount, the balance between speed of deployment and comprehensive validation is particularly critical. ## Future Implications and Industry Impact Harvey's approach to LLMOps represents an interesting model for how companies can build infrastructure to rapidly adapt to the fast-paced evolution of AI capabilities. Their focus on modular, extensible architectures and AI-assisted development practices provides a template that other organizations might consider when building their own AI systems. The case study highlights the importance of making architectural decisions early that support long-term flexibility and rapid iteration. Their investment in building composable AI primitives and transparent user interfaces from day one appears to be paying dividends in their ability to quickly integrate new capabilities. However, the ultimate test of Harvey's LLMOps approach will be its long-term sustainability and scalability as they continue to add new features and serve a growing customer base. The legal industry's requirements for accuracy, auditability, and regulatory compliance will provide a rigorous test of their architectural decisions and operational practices. The case study also raises important questions about the future of LLMOps as AI capabilities continue to evolve rapidly. Organizations will need to balance the desire for rapid integration of new features with the need for thorough testing, security reviews, and user training that enterprise applications typically require.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source