Build.inc developed a sophisticated multi-agent system called Dougie to automate complex commercial real estate development workflows, particularly for data center projects. Using LangGraph for orchestration, they implemented a hierarchical system of over 25 specialized agents working in parallel to perform land diligence tasks. The system reduces what traditionally took human consultants four weeks to complete down to 75 minutes, while maintaining high quality and depth of analysis.
Build.inc is a company focused on automating complex, labor-intensive workflows in commercial real estate (CRE) development, with a particular emphasis on data centers, renewable energy facilities, and other energy-intensive infrastructure projects. Their flagship product, referred to as “Dougie,” represents one of the more sophisticated multi-agent implementations documented in production, featuring over 25 sub-agents organized in a hierarchical structure. The case study provides valuable insights into building complex agentic systems for deterministic, high-stakes business workflows.
The core problem Build.inc addresses is the land diligence process—the task of researching a piece of land to determine its suitability for a particular project. This workflow is critical because errors in this phase can cost developers millions of dollars downstream. Traditionally, this process requires specialist consultant teams, consumes nearly half of the total project timeline, and involves navigating a fragmented data ecosystem with over 30,000 different US jurisdictions, each with its own regulations and data sources.
The heart of Build.inc’s approach is what they describe as a four-tier hierarchical multi-agent system built on LangGraph, LangChain’s framework for building stateful, multi-actor applications with LLMs. Their architecture consists of the following tiers:
Master Agent (“The Worker”): The top-level orchestration agent that coordinates the entire workflow and delegates tasks to specialized Role Agents. This agent understands the complete makeup of the workflow and uses LLMs to manage task sequencing and invocation.
Role Agents (“The Workflows”): These handle specialized functions such as data collection or risk evaluation. Each Role Agent manages one or more Sequence Agents beneath it.
Sequence Agents: These carry out multi-step processes that can involve up to 30 individual tasks. They coordinate the execution of Task Agents within their designated workflow sequences.
Task Agents: The most granular level, these agents are equipped with specific tools, context, and models optimized for their particular task. They execute individual operations and pass results back up the hierarchy.
This design mirrors how human organizations operate, with different specialists handling different aspects of a complex project. The company emphasizes that each agent is “specialized” and given appropriate guardrails, context, models, and tools for its specific responsibilities.
Build.inc leverages LangGraph’s capabilities in several key ways. They represent each agent as its own LangGraph subgraph, creating self-contained modules that simplify orchestration and visualization. This modular approach reduces interdependencies and makes the system easier to debug, test, and scale.
A critical aspect of their implementation is asynchronous execution. Running 25+ agents sequentially would be prohibitively slow, so they leverage LangGraph’s support for parallel execution. Multiple agents can run concurrently, dramatically reducing overall processing time. Despite this parallelization, the complete workflow still requires approximately 75 minutes—a testament to the depth and complexity of the tasks being performed. The company claims this represents a significant improvement over the four weeks typically required by human consultants.
The company’s approach to building agents emphasizes what they call the “dark matter” of AI: context. They identify three essential components for successful LLM-based workflow automation:
The case study offers several practical insights for teams building agent systems in production:
Determinism Over Open-Endedness: Build.inc emphasizes choosing carefully where to allow agent autonomy. They note that relying on predefined plans rather than asking agents to generate plans dynamically reduces unnecessary complexity and leads to more predictable, reliable outcomes. This is particularly important for high-stakes workflows where consistency matters.
Task Specialization: Each agent performs best when specialized for a specific task. “Training” an agent effectively means providing appropriate guardrails, often as simple as a JSON configuration file. Context, model selection, and tool availability should be tailored to each task rather than forcing a general-purpose agent to adapt to every scenario.
Task Decomposition: Breaking workflows into smaller, single-purpose tasks enables more efficient and accurate execution. This approach also supports modularity and composability, making the system easier to understand, modify, and extend. The company uses the phrase “the more you chain, the less pain you experience” to describe this principle.
Modular Design with Subgraphs: By representing each agent as its own LangGraph subgraph, Build.inc creates self-contained modules that are easier to orchestrate, visualize, debug, test, and scale. This architecture also enables flexibility in evolving the offering—they can add new workers, transition to new models, create new agent-graphs, or expand existing ones with minimal impact on the existing system.
The case study highlights several important production considerations. First, the system is designed for high-stakes decision-making where errors have significant financial consequences. This necessitates a more controlled, deterministic approach rather than fully autonomous agent decision-making.
Second, the fragmented data ecosystem (30,000+ US jurisdictions) presents significant challenges that traditional software approaches have struggled to address. The agent-based approach appears well-suited to handling this variability because agents can adapt to different data sources and regulatory contexts within their specialized domains.
Third, the company positions their approach as fundamentally different from traditional SaaS software. Rather than building rigid software that struggles with complexity and variability, they build composable, modular agent systems that can be reconfigured and extended more flexibly.
While the case study presents compelling claims, several aspects warrant careful consideration. The 75-minute versus four-week comparison is striking but lacks specific details about what exactly is being compared—it’s possible the human timeline includes waiting time, coordination overhead, and other factors beyond pure work time. The quality claims (“depth that human teams can’t match”) are difficult to verify without seeing actual output comparisons.
The system’s complexity (25+ agents, four tiers) also raises questions about maintainability, debugging challenges, and failure modes. While the modular architecture should help address these concerns, managing interdependencies in such a complex system likely requires significant operational expertise.
Additionally, the case study comes from a LangChain-affiliated blog, so there is inherent promotional interest in showcasing LangGraph’s capabilities. The technical claims appear reasonable, but independent verification of the performance metrics would strengthen the case.
Build.inc’s implementation represents a notable example of complex multi-agent orchestration in production. The four-tier hierarchical architecture, use of LangGraph subgraphs for modularity, and emphasis on asynchronous parallel execution provide a useful reference architecture for teams building sophisticated agentic workflows. The practical lessons around determinism, specialization, and task decomposition offer actionable guidance for LLMOps practitioners. While some performance claims should be viewed with appropriate skepticism, the technical approach and architectural patterns appear sound and applicable to other complex, multi-step workflow automation scenarios.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Octus, a leading provider of credit market data and analytics, migrated their flagship generative AI product Credit AI from a multi-cloud architecture (OpenAI on Azure and other services on AWS) to a unified AWS architecture using Amazon Bedrock. The migration addressed challenges in scalability, cost, latency, and operational complexity associated with running a production RAG application across multiple clouds. By leveraging Amazon Bedrock's managed services for embeddings, knowledge bases, and LLM inference, along with supporting AWS services like Lambda, S3, OpenSearch, and Textract, Octus achieved a 78% reduction in infrastructure costs, 87% decrease in cost per question, improved document sync times from hours to minutes, and better development velocity while maintaining SOC2 compliance and serving thousands of concurrent users across financial services clients.
Yahoo! Finance built a production-scale financial question answering system using multi-agent architecture to address the information asymmetry between retail and institutional investors. The system leverages Amazon Bedrock Agent Core and employs a supervisor-subagent pattern where specialized agents handle structured data (stock prices, financials), unstructured data (SEC filings, news), and various APIs. The solution processes heterogeneous financial data from multiple sources, handles temporal complexities of fiscal years, and maintains context across sessions. Through a hybrid evaluation approach combining human and AI judges, the system achieves strong accuracy and coverage metrics while processing queries in 5-50 seconds at costs of 2-5 cents per query, demonstrating production viability at scale with support for 100+ concurrent users.