ZenML

AI-Powered Multi-Agent Decision Support System for Enterprise Strategic Planning

Coinbase 2025
View original source

Coinbase developed RAPID-D, an AI-powered decision support tool to augment their existing RAPID decision-making framework used for critical strategic choices. The system employs a multi-agent architecture where specialized AI agents collaborate to analyze decision documents, surface risks, challenge assumptions, and provide comprehensive recommendations to human decision-makers. By implementing a modular approach with agents serving as analysts, contextual seekers, devil's advocates, and synthesizers, Coinbase created a transparent and auditable system that helps mitigate cognitive bias while maintaining human oversight. The solution was iteratively developed based on leadership feedback, achieving strong accuracy benchmarks with Claude 3.7 Sonnet, and incorporates real-time feedback mechanisms to continuously improve recommendation quality.

Industry

Finance

Technologies

Overview

Coinbase’s RAPID-D system represents a sophisticated production deployment of large language models designed to augment enterprise decision-making processes. The company recognized that while their existing RAPID (Recommender, Agree, Perform, Input, Decider) framework provided a solid structure for accountable decision-making on critical strategic issues, there was an opportunity to leverage advanced AI to systematically surface unseen risks, mitigate cognitive bias, and provide transparent, auditable analysis for high-stakes organizational decisions.

The RAPID framework at Coinbase is used for making critical decisions where inputs are gathered from key stakeholders across functions, and an accountable “decider” makes the final decision based on these insights. The framework includes structured elements such as recommendation dates, decision dates, decision type classification (Type 1 for irreversible decisions requiring extra care, Type 2 for reversible decisions), detailed recommendations with pros/cons and risks/benefits, and clearly named individuals in each role (Recommend, Agree, Perform, Input, Decide) who can provide their agreement or disagreement with additional context.

Technical Architecture and Multi-Agent System Design

RAPID-D is architected as a modular, multi-agent system rather than a monolithic AI providing single black-box answers. This design philosophy mirrors a team of expert advisors who collaborate, debate, and synthesize information. The system consists of four specialized agents, each with distinct responsibilities that work together in a structured pipeline:

The Single Shot Recommender Agent (referred to as “The Analyst”) performs the initial thorough, impartial review of the primary RAPID document. This agent generates a baseline recommendation based strictly on the facts and arguments presented in the document, providing a foundation for further analysis without external context or bias.

The Contextual Recommender Agent (called “The Seeker”) adds crucial organizational context to the decision-making process. This agent first generates critical questions about the RAPID document, then leverages Coinbase’s enterprise search tool to find answers across all internal knowledge sources. By synthesizing these findings, it provides a deeply informed decision recommendation that incorporates wider organizational context that might otherwise be missed. This approach essentially implements a retrieval-augmented generation (RAG) pattern specifically tailored for enterprise decision support.

The Contrarian Agent (designated as “The Devil’s Advocate”) serves a critical function in combating cognitive bias. This agent’s sole purpose is to build the strongest possible case against the initial recommendation, deliberately probing for weaknesses, unstated assumptions, potential risks, and unintended consequences. This adversarial approach helps ensure that decisions are tested against rigorous scrutiny before finalization.

The Debate and Decide Agent (functioning as “The Synthesizer”) acts as an impartial moderator that meticulously evaluates the arguments from all previous agents. It considers the baseline analysis from The Analyst, the broader organizational context from The Seeker, and the challenges raised by The Devil’s Advocate. The agent then produces a comprehensive final recommendation for the human Decider, complete with detailed explanations of its reasoning and the trade-offs it considered. This design ensures transparency and auditability in the AI’s decision-making process.

RAG Implementation and Enterprise Knowledge Integration

A key component of RAPID-D’s effectiveness is its sophisticated document context retrieval system. This system includes a “key question generator” block that examines each decision document and formulates targeted questions around critical areas such as security implications, market impact, cost considerations, user experience effects, and scalability concerns. For every question generated, the system retrieves relevant information by searching Coinbase’s enterprise knowledge base.

This retrieval-augmented approach ensures that recommendations are comprehensive, less biased, and tailored to the specific context of each strategic decision. The RAG implementation allows the system to ground its recommendations in actual organizational knowledge and historical context rather than relying solely on the information contained within a single RAPID document. This is crucial for avoiding decisions made in isolation that might conflict with broader organizational initiatives or ignore relevant precedents.

Iterative Development and User-Centered Design

The development of RAPID-D followed a deliberate, iterative process focused on feedback from Coinbase’s leadership. The initial version started with a single agent that analyzed RAPID documents and presented its reasoning, with early feedback gathered manually to understand what decision-makers valued most. This user-centered approach is characteristic of successful LLMOps implementations, where the technology is shaped by the actual needs and workflows of end users rather than being imposed as a purely technical solution.

The current version represents a significant evolution, incorporating the multi-agent debate structure and providing deeper explainability for the AI’s conclusions. The team also implemented an asynchronous architecture to handle complex decisions that require more processing time. This architectural decision ensures that users are kept informed throughout the process and receive well-reasoned outputs without experiencing unnecessary delays or timeouts, which is a critical consideration for production LLM systems handling variable workloads.

Model Selection and Evaluation Approach

Coinbase conducted systematic evaluation of RAPID-D’s accuracy through a human review process that compared each of the system’s final recommendations against the real decisions documented by Coinbase’s RAPID Deciders. This yielded benchmark scores across multiple leading models. While the document notes that Claude 3.7 Sonnet was ultimately chosen for its strong balance of quality, stability, and reliability, the evaluation process itself demonstrates a mature LLMOps approach to model selection.

The evaluation methodology—comparing AI recommendations against actual human decisions made by experienced decision-makers—provides a realistic assessment of the system’s practical utility. This is more valuable than abstract benchmarks that might not reflect the system’s performance in its actual use case. However, it’s worth noting that this evaluation approach assumes that the historical human decisions represent “ground truth,” which may not always be the case. A more nuanced evaluation might also consider whether the AI surfaced valid concerns that humans missed, even if the final decision differed.

Real-Time Feedback Integration and Continuous Improvement

A particularly sophisticated aspect of RAPID-D’s LLMOps implementation is its ability to adapt in real time by incorporating feedback directly into its decision process. Comments or corrections—whether provided by the user during an active session or later by any stakeholder in the RAPID document—are captured and analyzed against the assistant’s original recommendation. This evaluation is then used to optimize subsequent recommendations within the same decision flow.

This feedback mechanism ensures that the assistant’s output reflects the most up-to-date perspectives and context from all participants. It represents a form of online learning or adaptation, though the document doesn’t specify whether this involves fine-tuning the underlying model, adjusting prompts, or updating a retrieval index. Regardless of the specific implementation, this feedback loop is crucial for maintaining system accuracy and relevance as organizational context evolves.

Production Deployment Considerations

The asynchronous architecture mentioned in the case study addresses a critical production deployment challenge: managing variable processing times for complex analyses while maintaining good user experience. For LLM systems that need to perform multiple retrieval operations, generate questions, search knowledge bases, and synthesize multi-agent debates, processing times can vary significantly. An asynchronous design allows the system to handle these variations gracefully, keeping users informed of progress without blocking or timing out.

The transparency and auditability emphasis throughout the system design reflects mature thinking about enterprise AI deployment. By making the reasoning process visible and showing how different agents contributed to the final recommendation, RAPID-D addresses concerns about “black box” AI systems that are common barriers to adoption in enterprise settings. Decision-makers can understand not just what the AI recommends, but why it recommends it and what alternative perspectives were considered.

Critical Assessment and Balanced Perspective

While Coinbase’s RAPID-D system demonstrates sophisticated LLMOps practices, it’s important to maintain a balanced perspective on the claims and approach. The multi-agent architecture is well-designed for exploring different perspectives on a decision, but the effectiveness of such systems depends heavily on prompt engineering, the quality of the enterprise search results, and the ability of the language model to truly understand organizational context rather than simply retrieving and paraphrasing information.

The claim that the system “mitigates cognitive bias” should be considered carefully. While the contrarian agent design can surface alternative perspectives that might be overlooked, LLMs themselves can exhibit various biases based on their training data. The system may help identify some cognitive biases in human decision-making, but it’s not a complete solution to bias and could potentially introduce new forms of bias from the AI itself.

The evaluation approach, while practical, has limitations. Measuring accuracy by comparing AI recommendations to actual human decisions assumes those decisions were correct, which may not always be true. A more comprehensive evaluation might also track whether decisions made with RAPID-D assistance had better outcomes than those made without it, though such longitudinal studies are challenging to conduct rigorously in business settings.

The case study doesn’t provide detailed information about several important LLMOps aspects: how prompts are versioned and managed across the different agents, how the system handles edge cases or low-confidence situations, what guardrails exist to prevent inappropriate recommendations, how often the system requires human intervention, or what the actual adoption rate and user satisfaction metrics are. These details would provide a more complete picture of the production deployment challenges and solutions.

The choice of Claude 3.7 Sonnet is noted for its balance of quality, stability, and reliability, but the case study doesn’t discuss what happens if the model provider changes their API, updates the model, or experiences outages. Robust LLMOps implementations typically need strategies for handling model provider dependencies, including fallback options and monitoring for performance degradation when models are updated.

Enterprise AI and Organizational Learning

Beyond the technical implementation, RAPID-D represents an interesting approach to organizational learning and knowledge management. By systematically analyzing decision documents, retrieving relevant context from across the organization, and highlighting potential issues, the system essentially codifies certain aspects of institutional knowledge and decision-making expertise. This could be particularly valuable for maintaining decision quality as organizations scale or as experienced decision-makers leave.

However, there’s also a risk that over-reliance on such systems could lead to homogenization of decision-making or reduced development of decision-making skills in junior employees who might otherwise learn by participating in the full RAPID process. The system is positioned as augmentation rather than replacement, which is the appropriate framing, but maintaining that balance in practice requires ongoing attention to organizational culture and practices.

The structured, multi-step pipeline with specialized agents demonstrates a mature understanding of how to decompose complex cognitive tasks for AI systems. Rather than asking a single model to perform all aspects of decision analysis simultaneously, the system breaks the problem into distinct phases (initial analysis, context gathering, adversarial testing, synthesis) that can each be optimized and evaluated independently. This modular approach also makes the system more maintainable and easier to improve over time.

Conclusion

Coinbase’s RAPID-D system showcases several LLMOps best practices including multi-agent architecture for complex reasoning tasks, retrieval-augmented generation for grounding in organizational context, systematic evaluation against real use cases, asynchronous processing for handling variable workloads, transparency and explainability in recommendations, and real-time feedback integration for continuous improvement. The system demonstrates how enterprise organizations can apply LLMs to augment critical business processes while maintaining human oversight and accountability. The iterative, user-centered development approach and the emphasis on transparency align with mature AI deployment practices. However, as with any enterprise AI system, ongoing monitoring, evaluation, and refinement will be essential to ensure it continues to deliver value and doesn’t introduce unintended consequences into organizational decision-making processes.

More Like This

Building Production-Ready Agentic AI Systems in Financial Services

Fitch Group 2025

Jayeeta Putatunda, Director of AI Center of Excellence at Fitch Group, shares lessons learned from deploying agentic AI systems in the financial services industry. The discussion covers the challenges of moving from proof-of-concept to production, emphasizing the importance of evaluation frameworks, observability, and the "data prep tax" required for reliable AI agent deployments. Key insights include the need to balance autonomous agents with deterministic workflows, implement comprehensive logging at every checkpoint, combine LLMs with traditional predictive models for numerical accuracy, and establish strong business-technical partnerships to define success metrics. The conversation highlights that while agentic frameworks enable powerful capabilities, production success requires careful system design, multi-layered evaluation, human-in-the-loop validation patterns, and a focus on high-ROI use cases rather than chasing the latest model architectures.

document_processing data_analysis summarization +32

Multi-Agent AI System for Financial Intelligence and Risk Analysis

Moody’s 2025

Moody's Analytics, a century-old financial institution serving over 1,500 customers across 165 countries, transformed their approach to serving high-stakes financial decision-making by evolving from a basic RAG chatbot to a sophisticated multi-agent AI system on AWS. Facing challenges with unstructured financial data (PDFs with complex tables, charts, and regulatory documents), context window limitations, and the need for 100% accuracy in billion-dollar decisions, they architected a serverless multi-agent orchestration system using Amazon Bedrock, specialized task agents, custom workflows supporting up to 400 steps, and intelligent document processing pipelines. The solution processes over 1 million tokens daily in production, achieving 60% faster insights and 30% reduction in task completion times while maintaining the precision required for credit ratings, risk intelligence, and regulatory compliance across credit, climate, economics, and compliance domains.

fraud_detection document_processing question_answering +42

Multi-Agent Financial Research and Question Answering System

Yahoo! Finance 2025

Yahoo! Finance built a production-scale financial question answering system using multi-agent architecture to address the information asymmetry between retail and institutional investors. The system leverages Amazon Bedrock Agent Core and employs a supervisor-subagent pattern where specialized agents handle structured data (stock prices, financials), unstructured data (SEC filings, news), and various APIs. The solution processes heterogeneous financial data from multiple sources, handles temporal complexities of fiscal years, and maintains context across sessions. Through a hybrid evaluation approach combining human and AI judges, the system achieves strong accuracy and coverage metrics while processing queries in 5-50 seconds at costs of 2-5 cents per query, demonstrating production viability at scale with support for 100+ concurrent users.

question_answering data_analysis chatbot +49