Company
Rexera
Title
Evolving Quality Control AI Agents with LangGraph
Industry
Tech
Year
2024
Summary (short)
Rexera transformed their real estate transaction quality control process by evolving from single-prompt LLM checks to a sophisticated LangGraph-based solution. The company initially faced challenges with single-prompt LLMs and CrewAI implementations, but by migrating to LangGraph, they achieved significant improvements in accuracy, reducing false positives from 8% to 2% and false negatives from 5% to 2% through more precise control and structured decision paths.
## Overview Rexera is an AI automation company operating in the real estate transaction industry, which the company describes as a $50 billion market. Their platform uses AI agents to automate manual workflows in real estate operations, focusing on tasks such as ordering payoff statements, extracting critical data from documents, and performing quality control checks. This case study documents their evolution from simple single-prompt LLM implementations to sophisticated multi-agent systems using LangGraph, specifically for their Quality Control (QC) application. The QC application is a mission-critical component that reviews thousands of workflows daily, checking for errors across various stages of real estate transactions including data handling, client communication, and interactions with counterparties like homeowner associations (HOAs), county offices, and utility companies. The evolution of this system represents a practical example of iterating on LLM-based production systems to achieve higher accuracy and reliability. ## The Problem Space Quality control in real estate transactions requires checking multiple dimensions of accuracy simultaneously. The QC system needs to verify document accuracy, ensure client expectations are met, monitor workflow timeliness for SLA compliance, and control costs. These checks must be performed at scale across thousands of workflows while maintaining high accuracy to prevent transaction delays. The fundamental challenge was that real estate workflows are inherently complex, involving multi-step processes with various conditional paths. For example, a rush order requires different handling than a standard order, and the QC system must be able to recognize and appropriately evaluate each scenario. This complexity proved difficult for simpler LLM implementations to handle effectively. ## Initial Approach: Single-Prompt LLM Checks Rexera's first implementation used single-prompt LLM checks for quality control. In this approach, the LLM would receive relevant context about a workflow and determine whether there were issues requiring attention. The company evaluated these checks using three key metrics: accuracy (correctness scores for issue identification), efficiency (execution speed per transaction), and cost-effectiveness (associated LLM costs). While this approach provided some automation benefits by flagging potential issues and reducing manual review needs, it had significant limitations. Single-prompt LLMs struggled with the complexity of real estate workflows because they couldn't grasp the full scope of a workflow, had limited context windows, and couldn't properly navigate multi-dimensional scenarios. The case study reports this approach produced a 35% false positive rate (incorrectly flagging non-issues) and a 10% false negative rate (failing to flag real issues). An illustrative example provided in the case study shows how the single-prompt approach failed on rush order scenarios. When evaluating a rush order that was actually acknowledged and executed correctly, the single-prompt LLM incorrectly flagged an issue, stating "We did not explicitly acknowledge the rush request from the client in our communication." This false positive occurred because the LLM's limited ability to handle complex, multi-step interactions prevented it from recognizing that the rush request had indeed been properly handled. ## Evolution to Multi-Agent Systems with CrewAI Recognizing the limitations of single-prompt approaches, Rexera experimented with CrewAI, a multi-agent framework. In this architecture, each AI agent was assigned responsibility for a different part of the transaction process. The case study provides an example of an agent defined with the role "Senior Content Quality Check Analyst" and tasked with checking "if all HOA documents requested by the client have been ordered, and verify that corresponding ETA and cost information has been sent to the client." The multi-agent approach yielded substantial improvements over single-prompt LLMs. False positives dropped from 35% to 8%, and false negatives fell from 10% to 5%. However, a new challenge emerged: the agents sometimes took incorrect decision paths, analogous to a GPS system choosing a longer route. This lack of precise control meant that in complex scenarios, agents might veer off course, still leading to quality issues. In the rush order example, the CrewAI system correctly identified that the rush order was acknowledged and executed, but it failed to notice a discrepancy in how the order type was recorded in the system—specifically, that the order was marked as "Rush Order: False" despite being handled as a rush order. This partial accuracy demonstrated that while multi-agent systems were an improvement, they still lacked the precision needed for production-grade quality control. ## Migration to LangGraph for Enhanced Control The final architectural evolution moved to LangGraph, a controllable agent framework built by the LangChain team. LangGraph's key differentiator is its support for cycles and branching in agent workflows, enabling Rexera to custom-design decision paths for various scenarios. This was particularly beneficial for complex cases where deterministic decision-making was required. LangGraph also brought additional capabilities important for production LLM systems, including integration of human-in-the-loop workflows and state management. These features are crucial for enterprise applications where certain decisions may require human oversight and where maintaining context across multiple steps is essential. Rexera implemented a tree-like structure for their QC application that allows for cycles and branching. When the application identifies a rush order, it follows the "Rush Order" branch of the decision tree with specific checks designed for that scenario. Standard orders follow a different branch with checks appropriate for regular processing. This deterministic structure dramatically improved accuracy by reducing the randomness of agents taking incorrect paths. The results were significant: false positives decreased from 8% to 2%, and false negatives dropped from 5% to 2%. In the rush order example, the LangGraph implementation not only confirmed that the rush order was acknowledged and executed but also identified the inconsistency in how the order type was recorded (marked as "Rush Order: False" despite being handled as a rush). The custom decision path ensured both the acknowledgment of the rush and the proper handling of the order type were verified. ## LLMOps Considerations and Analysis This case study illustrates several important LLMOps principles. First, it demonstrates the value of systematic evaluation metrics. Rexera tracked accuracy, efficiency, and cost-effectiveness across thousands of workflow runs, enabling data-driven decisions about architectural changes. The specific metrics reported (false positive and false negative rates across different implementations) show the importance of establishing baselines and measuring improvements. Second, the evolution from single-prompt to multi-agent to graph-based architectures represents a common maturation path for production LLM systems. Many organizations start with simple prompt-based approaches, discover their limitations, and progressively adopt more sophisticated architectures. The case study provides concrete evidence that each architectural step can yield measurable improvements, though with diminishing returns and increased complexity. Third, the emphasis on determinism and control in the LangGraph implementation reflects a broader industry trend toward reducing the non-deterministic aspects of LLM-based systems. While LLMs inherently have some randomness, production systems often require predictable behavior, especially for quality control and compliance use cases. LangGraph's branching and cycle capabilities provide a way to impose structure on agent decision-making. It's worth noting that this case study originates from LangChain's blog, which has a commercial interest in promoting LangGraph. While the reported metrics appear specific and credible, the comparison between CrewAI and LangGraph should be interpreted with some caution given the source. The fundamental insight—that more controlled agent architectures can improve accuracy—is sound, but the specific magnitude of improvements may reflect the particular implementation choices made at each stage rather than inherent framework capabilities. The case study does not provide details about infrastructure, deployment, monitoring, or operational aspects of running these systems in production. It also doesn't discuss cost implications of the different approaches beyond mentioning cost-effectiveness as a tracked metric. Future implementations would benefit from understanding these operational considerations alongside accuracy improvements. ## Key Takeaways The Rexera case study demonstrates that LLM-based quality control systems can achieve production-grade accuracy through iterative architectural improvements. Moving from single-prompt to multi-agent to graph-based architectures yielded progressive accuracy improvements, with the final LangGraph implementation achieving 2% false positive and 2% false negative rates on complex real estate workflows. The key architectural insight is that deterministic decision paths and branching structures can significantly reduce the unpredictability of LLM agents in production scenarios where consistency and accuracy are critical.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.