Orbital: Scaling Agentic AI Systems for Real Estate Due Diligence: Managing Prompt Tax at Production Scale

LLMOps Database

Legal

Orbital

Company

Orbital

Title

Scaling Agentic AI Systems for Real Estate Due Diligence: Managing Prompt Tax at Production Scale

Industry

Legal

Link

https://www.youtube.com/watch?v=Bf71xMwd-Y0

Year

2025

Summary (short)

Orbital, a real estate technology company, developed an agentic AI system called Orbital Co-pilot to automate legal due diligence for property transactions. The system processes hundreds of pages of legal documents to extract key information traditionally done manually by lawyers. Over 18 months, they scaled from zero to processing 20 billion tokens monthly and achieved multiple seven figures in annual recurring revenue. The presentation focuses on their concept of "prompt tax" - the hidden costs and complexities of continuously upgrading AI models in production, including prompt migration, regression risks, and the operational challenges of shipping at the AI frontier.

Orbital is a real estate technology company with offices in New York and London that has built an impressive agentic AI system to automate real estate due diligence processes. Their mission centers on accelerating property transactions by automating the traditionally manual and time-intensive work that real estate lawyers perform when reviewing mountains of legal documents to identify potential red flags for their clients. The company has grown to approximately 80 people, with half comprising the product engineering team. Their organizational structure includes product managers, designers, domain experts (practicing real estate lawyers), software engineers, AI engineers, and technical leads working in cross-functional teams. This structure proves particularly important given the specialized nature of real estate law and the need for domain expertise in prompt engineering. ## Technical Architecture and Evolution Orbital's flagship product, Orbital Co-pilot, represents a sophisticated agentic system that processes complex legal documents. The system begins by performing OCR on uploaded documents, which can include handwritten and typed text across dozens or hundreds of pages. The agent then creates a structured plan, breaking down the overall objective into multiple subtasks, each handled by its own agentic subsystem with multiple LLM calls. Each subtask focuses on specific information extraction goals, such as finding lease dates, annual rent amounts, or other critical legal details. The system's technical evolution over 18 months demonstrates the challenges of operating at the AI frontier. They began with GPT-3.5 and progressed through various System 1 models, with GPT-4 32K being particularly significant for enabling longer context windows essential for processing lengthy legal documents. They subsequently migrated to GPT-4 Turbo, GPT-4o, and eventually to System 2 models including O1 preview and O1 mini. This progression illustrates the rapid pace of model advancement and the continuous need for adaptation in production systems. ## The Concept of Prompt Tax The presentation introduces the critical concept of "prompt tax" - the hidden costs and complexities associated with upgrading AI models in production agentic systems. Unlike traditional technical debt, which often involves shortcuts taken for speed that may be fixed later, prompt tax represents an ongoing operational reality. When new AI models are released, they offer compelling new capabilities that teams want to incorporate, but migration brings uncertainty about what will improve and what might break. The company operates with over 1,000 domain-specific prompts, written primarily by their embedded real estate lawyers who translate decades of legal expertise into prompts that teach the AI system. This extensive prompt library creates significant migration challenges when new models are released, as each prompt may need adjustment or complete rewriting to work optimally with new model capabilities. ## Strategic Decisions and Trade-offs Orbital made three key strategic decisions that shaped their LLMOps approach. First, they optimized for prompting over fine-tuning to maximize development speed and maintain the ability to incorporate user feedback rapidly through real-time prompt adjustments. This decision enabled faster iteration cycles and better responsiveness to user needs, particularly crucial during their product-market fit phase. Second, they heavily invested in domain experts - practicing real estate lawyers who joined the company and now write many of the domain-specific prompts. This approach ensures that decades of legal expertise get properly encoded into the system's behavior, though it requires significant human capital investment and coordination between legal experts and AI engineers. Third, and perhaps most controversially, they chose to rely on "vibes over evals" - meaning they haven't implemented a rigorous, automated evaluation system. Instead, they depend on human domain experts testing the system before releases, combined with subjective assessments and informal tracking in spreadsheets. While this approach has supported their impressive growth to date, the presentation acknowledges questions about its long-term scalability as their product surface area expands. ## Model Migration Challenges and Strategies The transition from System 1 to System 2 models revealed important insights about prompt engineering across different model architectures. System 1 models typically required very specific instructions about how to accomplish tasks, with frequent repetition of key instructions to ensure compliance. System 2 models, however, performed better when given clearer objectives with fewer constraints, allowing them more freedom to reason through problems independently. This fundamental difference meant that migrating prompts wasn't simply a matter of copying existing text - it required understanding how different model types process instructions and restructuring prompts accordingly. They found that System 2 models preferred leaner prompts focused on what to accomplish rather than detailed how-to instructions. ## Production Deployment Strategies Orbital employs several sophisticated strategies for managing model deployments in production. They use feature flags for AI model rollouts, similar to traditional software feature flags, enabling progressive delivery of new model capabilities. This approach helps mitigate the "change aversion bias" - the natural anxiety that comes with moving to new systems, even when those systems may be superior. The team developed a mantra of "betting on the model" - building features not just for current AI capabilities but anticipating where models will be in 3-6-12 months. This forward-looking approach allows them to build features that improve automatically as underlying models become more capable, creating compound value over time. They also discovered the utility of using new models to help migrate their own prompts. By feeding domain-specific prompts written for older models into newer models, they can often get assistance in updating and optimizing those prompts for the new architecture, reducing manual migration effort. ## Operational Feedback Loops The company has built strong feedback mechanisms that enable rapid response to issues. User feedback flows directly to domain experts through their product interface, often via simple thumbs up/thumbs down mechanisms. Domain experts can then identify necessary prompt changes, implement them, and deploy fixes to production within minutes or hours rather than the days or weeks typical of traditional software bug fixes. This rapid feedback cycle proves essential for maintaining system quality while operating with their "vibes-based" evaluation approach. The speed of response helps compensate for the lack of comprehensive automated testing, though questions remain about scalability as usage grows. ## Scale and Business Impact The business metrics demonstrate the significant scale at which this LLMOps system operates. They've grown from processing essentially zero tokens 18 months ago to nearly 20 billion tokens monthly - representing an enormous volume of work previously done manually by lawyers now automated through their agentic system. This scaling accompanied revenue growth from zero to multiple seven figures in annual recurring revenue. These metrics illustrate both the business opportunity in legal automation and the technical challenges of operating large-scale LLM systems in production. Processing 20 billion tokens monthly requires robust infrastructure, cost management, and performance optimization - all while maintaining the reliability that legal professionals require. ## Technical and Operational Challenges The presentation honestly acknowledges several ongoing challenges. The rapid pace of AI model advancement creates constant pressure to upgrade, but each upgrade brings uncertainty. New models may improve some capabilities while introducing regressions in others, and the probabilistic nature of LLMs makes it difficult to predict all possible outcomes. The company faces the challenge of managing risk while staying competitive. Waiting too long to adopt new models means missing out on capabilities that could improve their product, but moving too quickly risks introducing issues that could impact client work. They've developed a philosophy of "buy now, pay later" - adopting new capabilities quickly and addressing issues as they arise rather than trying to perfect systems before deployment. ## Evaluation and Quality Assurance Questions While their vibes-based approach has worked well to date, the presentation raises important questions about long-term scalability. As their product surface area grows and edge cases multiply, relying primarily on human evaluation may become prohibitively expensive or slow. However, building comprehensive evaluation systems for complex legal work presents its own challenges. Legal document analysis requires correctness not just in answers but in style, conciseness, and citation accuracy. Creating automated evaluations that capture all these dimensions across the full range of real estate legal scenarios could be "prohibitively expensive, prohibitively slow and it might even be a bit of an impossible task" according to the presenter. ## Industry Implications and Future Directions The presentation touches on broader implications for the AI engineering community. The speaker suggests that "product AI engineers" who understand both technical capabilities and user needs represent a significant opportunity. This role combines deep technical understanding of model capabilities with product sense about how to translate those capabilities into valuable user features. The concept of prompt tax and the strategies for managing it likely apply beyond real estate legal work to any domain deploying complex agentic systems at scale. The frameworks for thinking about model migration, progressive deployment, and risk management offer valuable insights for other organizations operating at the AI frontier. Looking forward, Orbital plans to eventually implement more rigorous evaluation systems, though they acknowledge the significant challenges involved. The presentation concludes with an invitation for collaboration with the broader AI engineering community to develop and share battle-tested tactics for managing production AI systems at scale. This case study illustrates both the tremendous opportunities and practical challenges of deploying sophisticated AI systems in specialized professional domains, offering valuable insights into the operational realities of LLMOps at significant scale.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source