Coinbase: AI-Powered Multi-Agent Decision Support System for Enterprise Strategic Planning

LLMOps Database

Finance

Coinbase

Company

Coinbase

Title

AI-Powered Multi-Agent Decision Support System for Enterprise Strategic Planning

Industry

Finance

Link

https://www.coinbase.com/en-it/blog/making-smarter-decisions-faster-with-AI-at-Coinbase

Year

2025

Summary (short)

Coinbase developed RAPID-D, an AI-powered decision support tool to augment their existing RAPID decision-making framework used for critical strategic choices. The system employs a multi-agent architecture where specialized AI agents collaborate to analyze decision documents, surface risks, challenge assumptions, and provide comprehensive recommendations to human decision-makers. By implementing a modular approach with agents serving as analysts, contextual seekers, devil's advocates, and synthesizers, Coinbase created a transparent and auditable system that helps mitigate cognitive bias while maintaining human oversight. The solution was iteratively developed based on leadership feedback, achieving strong accuracy benchmarks with Claude 3.7 Sonnet, and incorporates real-time feedback mechanisms to continuously improve recommendation quality.

Tags

question_answering

high_stakes_application

## Overview Coinbase's RAPID-D system represents a sophisticated production deployment of large language models designed to augment enterprise decision-making processes. The company recognized that while their existing RAPID (Recommender, Agree, Perform, Input, Decider) framework provided a solid structure for accountable decision-making on critical strategic issues, there was an opportunity to leverage advanced AI to systematically surface unseen risks, mitigate cognitive bias, and provide transparent, auditable analysis for high-stakes organizational decisions. The RAPID framework at Coinbase is used for making critical decisions where inputs are gathered from key stakeholders across functions, and an accountable "decider" makes the final decision based on these insights. The framework includes structured elements such as recommendation dates, decision dates, decision type classification (Type 1 for irreversible decisions requiring extra care, Type 2 for reversible decisions), detailed recommendations with pros/cons and risks/benefits, and clearly named individuals in each role (Recommend, Agree, Perform, Input, Decide) who can provide their agreement or disagreement with additional context. ## Technical Architecture and Multi-Agent System Design RAPID-D is architected as a modular, multi-agent system rather than a monolithic AI providing single black-box answers. This design philosophy mirrors a team of expert advisors who collaborate, debate, and synthesize information. The system consists of four specialized agents, each with distinct responsibilities that work together in a structured pipeline: The **Single Shot Recommender Agent** (referred to as "The Analyst") performs the initial thorough, impartial review of the primary RAPID document. This agent generates a baseline recommendation based strictly on the facts and arguments presented in the document, providing a foundation for further analysis without external context or bias. The **Contextual Recommender Agent** (called "The Seeker") adds crucial organizational context to the decision-making process. This agent first generates critical questions about the RAPID document, then leverages Coinbase's enterprise search tool to find answers across all internal knowledge sources. By synthesizing these findings, it provides a deeply informed decision recommendation that incorporates wider organizational context that might otherwise be missed. This approach essentially implements a retrieval-augmented generation (RAG) pattern specifically tailored for enterprise decision support. The **Contrarian Agent** (designated as "The Devil's Advocate") serves a critical function in combating cognitive bias. This agent's sole purpose is to build the strongest possible case against the initial recommendation, deliberately probing for weaknesses, unstated assumptions, potential risks, and unintended consequences. This adversarial approach helps ensure that decisions are tested against rigorous scrutiny before finalization. The **Debate and Decide Agent** (functioning as "The Synthesizer") acts as an impartial moderator that meticulously evaluates the arguments from all previous agents. It considers the baseline analysis from The Analyst, the broader organizational context from The Seeker, and the challenges raised by The Devil's Advocate. The agent then produces a comprehensive final recommendation for the human Decider, complete with detailed explanations of its reasoning and the trade-offs it considered. This design ensures transparency and auditability in the AI's decision-making process. ## RAG Implementation and Enterprise Knowledge Integration A key component of RAPID-D's effectiveness is its sophisticated document context retrieval system. This system includes a "key question generator" block that examines each decision document and formulates targeted questions around critical areas such as security implications, market impact, cost considerations, user experience effects, and scalability concerns. For every question generated, the system retrieves relevant information by searching Coinbase's enterprise knowledge base. This retrieval-augmented approach ensures that recommendations are comprehensive, less biased, and tailored to the specific context of each strategic decision. The RAG implementation allows the system to ground its recommendations in actual organizational knowledge and historical context rather than relying solely on the information contained within a single RAPID document. This is crucial for avoiding decisions made in isolation that might conflict with broader organizational initiatives or ignore relevant precedents. ## Iterative Development and User-Centered Design The development of RAPID-D followed a deliberate, iterative process focused on feedback from Coinbase's leadership. The initial version started with a single agent that analyzed RAPID documents and presented its reasoning, with early feedback gathered manually to understand what decision-makers valued most. This user-centered approach is characteristic of successful LLMOps implementations, where the technology is shaped by the actual needs and workflows of end users rather than being imposed as a purely technical solution. The current version represents a significant evolution, incorporating the multi-agent debate structure and providing deeper explainability for the AI's conclusions. The team also implemented an asynchronous architecture to handle complex decisions that require more processing time. This architectural decision ensures that users are kept informed throughout the process and receive well-reasoned outputs without experiencing unnecessary delays or timeouts, which is a critical consideration for production LLM systems handling variable workloads. ## Model Selection and Evaluation Approach Coinbase conducted systematic evaluation of RAPID-D's accuracy through a human review process that compared each of the system's final recommendations against the real decisions documented by Coinbase's RAPID Deciders. This yielded benchmark scores across multiple leading models. While the document notes that Claude 3.7 Sonnet was ultimately chosen for its strong balance of quality, stability, and reliability, the evaluation process itself demonstrates a mature LLMOps approach to model selection. The evaluation methodology—comparing AI recommendations against actual human decisions made by experienced decision-makers—provides a realistic assessment of the system's practical utility. This is more valuable than abstract benchmarks that might not reflect the system's performance in its actual use case. However, it's worth noting that this evaluation approach assumes that the historical human decisions represent "ground truth," which may not always be the case. A more nuanced evaluation might also consider whether the AI surfaced valid concerns that humans missed, even if the final decision differed. ## Real-Time Feedback Integration and Continuous Improvement A particularly sophisticated aspect of RAPID-D's LLMOps implementation is its ability to adapt in real time by incorporating feedback directly into its decision process. Comments or corrections—whether provided by the user during an active session or later by any stakeholder in the RAPID document—are captured and analyzed against the assistant's original recommendation. This evaluation is then used to optimize subsequent recommendations within the same decision flow. This feedback mechanism ensures that the assistant's output reflects the most up-to-date perspectives and context from all participants. It represents a form of online learning or adaptation, though the document doesn't specify whether this involves fine-tuning the underlying model, adjusting prompts, or updating a retrieval index. Regardless of the specific implementation, this feedback loop is crucial for maintaining system accuracy and relevance as organizational context evolves. ## Production Deployment Considerations The asynchronous architecture mentioned in the case study addresses a critical production deployment challenge: managing variable processing times for complex analyses while maintaining good user experience. For LLM systems that need to perform multiple retrieval operations, generate questions, search knowledge bases, and synthesize multi-agent debates, processing times can vary significantly. An asynchronous design allows the system to handle these variations gracefully, keeping users informed of progress without blocking or timing out. The transparency and auditability emphasis throughout the system design reflects mature thinking about enterprise AI deployment. By making the reasoning process visible and showing how different agents contributed to the final recommendation, RAPID-D addresses concerns about "black box" AI systems that are common barriers to adoption in enterprise settings. Decision-makers can understand not just what the AI recommends, but why it recommends it and what alternative perspectives were considered. ## Critical Assessment and Balanced Perspective While Coinbase's RAPID-D system demonstrates sophisticated LLMOps practices, it's important to maintain a balanced perspective on the claims and approach. The multi-agent architecture is well-designed for exploring different perspectives on a decision, but the effectiveness of such systems depends heavily on prompt engineering, the quality of the enterprise search results, and the ability of the language model to truly understand organizational context rather than simply retrieving and paraphrasing information. The claim that the system "mitigates cognitive bias" should be considered carefully. While the contrarian agent design can surface alternative perspectives that might be overlooked, LLMs themselves can exhibit various biases based on their training data. The system may help identify some cognitive biases in human decision-making, but it's not a complete solution to bias and could potentially introduce new forms of bias from the AI itself. The evaluation approach, while practical, has limitations. Measuring accuracy by comparing AI recommendations to actual human decisions assumes those decisions were correct, which may not always be true. A more comprehensive evaluation might also track whether decisions made with RAPID-D assistance had better outcomes than those made without it, though such longitudinal studies are challenging to conduct rigorously in business settings. The case study doesn't provide detailed information about several important LLMOps aspects: how prompts are versioned and managed across the different agents, how the system handles edge cases or low-confidence situations, what guardrails exist to prevent inappropriate recommendations, how often the system requires human intervention, or what the actual adoption rate and user satisfaction metrics are. These details would provide a more complete picture of the production deployment challenges and solutions. The choice of Claude 3.7 Sonnet is noted for its balance of quality, stability, and reliability, but the case study doesn't discuss what happens if the model provider changes their API, updates the model, or experiences outages. Robust LLMOps implementations typically need strategies for handling model provider dependencies, including fallback options and monitoring for performance degradation when models are updated. ## Enterprise AI and Organizational Learning Beyond the technical implementation, RAPID-D represents an interesting approach to organizational learning and knowledge management. By systematically analyzing decision documents, retrieving relevant context from across the organization, and highlighting potential issues, the system essentially codifies certain aspects of institutional knowledge and decision-making expertise. This could be particularly valuable for maintaining decision quality as organizations scale or as experienced decision-makers leave. However, there's also a risk that over-reliance on such systems could lead to homogenization of decision-making or reduced development of decision-making skills in junior employees who might otherwise learn by participating in the full RAPID process. The system is positioned as augmentation rather than replacement, which is the appropriate framing, but maintaining that balance in practice requires ongoing attention to organizational culture and practices. The structured, multi-step pipeline with specialized agents demonstrates a mature understanding of how to decompose complex cognitive tasks for AI systems. Rather than asking a single model to perform all aspects of decision analysis simultaneously, the system breaks the problem into distinct phases (initial analysis, context gathering, adversarial testing, synthesis) that can each be optimized and evaluated independently. This modular approach also makes the system more maintainable and easier to improve over time. ## Conclusion Coinbase's RAPID-D system showcases several LLMOps best practices including multi-agent architecture for complex reasoning tasks, retrieval-augmented generation for grounding in organizational context, systematic evaluation against real use cases, asynchronous processing for handling variable workloads, transparency and explainability in recommendations, and real-time feedback integration for continuous improvement. The system demonstrates how enterprise organizations can apply LLMs to augment critical business processes while maintaining human oversight and accountability. The iterative, user-centered development approach and the emphasis on transparency align with mature AI deployment practices. However, as with any enterprise AI system, ongoing monitoring, evaluation, and refinement will be essential to ensure it continues to deliver value and doesn't introduce unintended consequences into organizational decision-making processes.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source