ZenML

Multi-Agent Architecture for Automating Commercial Real Estate Development Workflows

Build.inc 2025
View original source

Build.inc developed a sophisticated multi-agent system called Dougie to automate complex commercial real estate development workflows, particularly for data center projects. Using LangGraph for orchestration, they implemented a hierarchical system of over 25 specialized agents working in parallel to perform land diligence tasks. The system reduces what traditionally took human consultants four weeks to complete down to 75 minutes, while maintaining high quality and depth of analysis.

Industry

Tech

Technologies

Overview

Build.inc is a company focused on automating complex, labor-intensive workflows in commercial real estate (CRE) development, with a particular emphasis on data centers, renewable energy facilities, and other energy-intensive infrastructure projects. Their flagship product, referred to as “Dougie,” represents one of the more sophisticated multi-agent implementations documented in production, featuring over 25 sub-agents organized in a hierarchical structure. The case study provides valuable insights into building complex agentic systems for deterministic, high-stakes business workflows.

The core problem Build.inc addresses is the land diligence process—the task of researching a piece of land to determine its suitability for a particular project. This workflow is critical because errors in this phase can cost developers millions of dollars downstream. Traditionally, this process requires specialist consultant teams, consumes nearly half of the total project timeline, and involves navigating a fragmented data ecosystem with over 30,000 different US jurisdictions, each with its own regulations and data sources.

Technical Architecture

The heart of Build.inc’s approach is what they describe as a four-tier hierarchical multi-agent system built on LangGraph, LangChain’s framework for building stateful, multi-actor applications with LLMs. Their architecture consists of the following tiers:

This design mirrors how human organizations operate, with different specialists handling different aspects of a complex project. The company emphasizes that each agent is “specialized” and given appropriate guardrails, context, models, and tools for its specific responsibilities.

LangGraph Implementation Details

Build.inc leverages LangGraph’s capabilities in several key ways. They represent each agent as its own LangGraph subgraph, creating self-contained modules that simplify orchestration and visualization. This modular approach reduces interdependencies and makes the system easier to debug, test, and scale.

A critical aspect of their implementation is asynchronous execution. Running 25+ agents sequentially would be prohibitively slow, so they leverage LangGraph’s support for parallel execution. Multiple agents can run concurrently, dramatically reducing overall processing time. Despite this parallelization, the complete workflow still requires approximately 75 minutes—a testament to the depth and complexity of the tasks being performed. The company claims this represents a significant improvement over the four weeks typically required by human consultants.

The company’s approach to building agents emphasizes what they call the “dark matter” of AI: context. They identify three essential components for successful LLM-based workflow automation:

Practical Lessons for Production LLMOps

The case study offers several practical insights for teams building agent systems in production:

Determinism Over Open-Endedness: Build.inc emphasizes choosing carefully where to allow agent autonomy. They note that relying on predefined plans rather than asking agents to generate plans dynamically reduces unnecessary complexity and leads to more predictable, reliable outcomes. This is particularly important for high-stakes workflows where consistency matters.

Task Specialization: Each agent performs best when specialized for a specific task. “Training” an agent effectively means providing appropriate guardrails, often as simple as a JSON configuration file. Context, model selection, and tool availability should be tailored to each task rather than forcing a general-purpose agent to adapt to every scenario.

Task Decomposition: Breaking workflows into smaller, single-purpose tasks enables more efficient and accurate execution. This approach also supports modularity and composability, making the system easier to understand, modify, and extend. The company uses the phrase “the more you chain, the less pain you experience” to describe this principle.

Modular Design with Subgraphs: By representing each agent as its own LangGraph subgraph, Build.inc creates self-contained modules that are easier to orchestrate, visualize, debug, test, and scale. This architecture also enables flexibility in evolving the offering—they can add new workers, transition to new models, create new agent-graphs, or expand existing ones with minimal impact on the existing system.

Production Considerations

The case study highlights several important production considerations. First, the system is designed for high-stakes decision-making where errors have significant financial consequences. This necessitates a more controlled, deterministic approach rather than fully autonomous agent decision-making.

Second, the fragmented data ecosystem (30,000+ US jurisdictions) presents significant challenges that traditional software approaches have struggled to address. The agent-based approach appears well-suited to handling this variability because agents can adapt to different data sources and regulatory contexts within their specialized domains.

Third, the company positions their approach as fundamentally different from traditional SaaS software. Rather than building rigid software that struggles with complexity and variability, they build composable, modular agent systems that can be reconfigured and extended more flexibly.

Critical Assessment

While the case study presents compelling claims, several aspects warrant careful consideration. The 75-minute versus four-week comparison is striking but lacks specific details about what exactly is being compared—it’s possible the human timeline includes waiting time, coordination overhead, and other factors beyond pure work time. The quality claims (“depth that human teams can’t match”) are difficult to verify without seeing actual output comparisons.

The system’s complexity (25+ agents, four tiers) also raises questions about maintainability, debugging challenges, and failure modes. While the modular architecture should help address these concerns, managing interdependencies in such a complex system likely requires significant operational expertise.

Additionally, the case study comes from a LangChain-affiliated blog, so there is inherent promotional interest in showcasing LangGraph’s capabilities. The technical claims appear reasonable, but independent verification of the performance metrics would strengthen the case.

Conclusion

Build.inc’s implementation represents a notable example of complex multi-agent orchestration in production. The four-tier hierarchical architecture, use of LangGraph subgraphs for modularity, and emphasis on asynchronous parallel execution provide a useful reference architecture for teams building sophisticated agentic workflows. The practical lessons around determinism, specialization, and task decomposition offer actionable guidance for LLMOps practitioners. While some performance claims should be viewed with appropriate skepticism, the technical approach and architectural patterns appear sound and applicable to other complex, multi-step workflow automation scenarios.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Migration of Credit AI RAG Application from Multi-Cloud to AWS Bedrock

Octus 2025

Octus, a leading provider of credit market data and analytics, migrated their flagship generative AI product Credit AI from a multi-cloud architecture (OpenAI on Azure and other services on AWS) to a unified AWS architecture using Amazon Bedrock. The migration addressed challenges in scalability, cost, latency, and operational complexity associated with running a production RAG application across multiple clouds. By leveraging Amazon Bedrock's managed services for embeddings, knowledge bases, and LLM inference, along with supporting AWS services like Lambda, S3, OpenSearch, and Textract, Octus achieved a 78% reduction in infrastructure costs, 87% decrease in cost per question, improved document sync times from hours to minutes, and better development velocity while maintaining SOC2 compliance and serving thousands of concurrent users across financial services clients.

document_processing question_answering summarization +45

Multi-Agent Financial Research and Question Answering System

Yahoo! Finance 2025

Yahoo! Finance built a production-scale financial question answering system using multi-agent architecture to address the information asymmetry between retail and institutional investors. The system leverages Amazon Bedrock Agent Core and employs a supervisor-subagent pattern where specialized agents handle structured data (stock prices, financials), unstructured data (SEC filings, news), and various APIs. The solution processes heterogeneous financial data from multiple sources, handles temporal complexities of fiscal years, and maintains context across sessions. Through a hybrid evaluation approach combining human and AI judges, the system achieves strong accuracy and coverage metrics while processing queries in 5-50 seconds at costs of 2-5 cents per query, demonstrating production viability at scale with support for 100+ concurrent users.

question_answering data_analysis chatbot +49