Moody’s: Multi-Agent AI System for Financial Intelligence and Risk Analysis

Company

Moody’s

Title

Multi-Agent AI System for Financial Intelligence and Risk Analysis

Industry

Finance

Link

https://www.youtube.com/watch?v=HC8jSzNmNpU

Year

2025

Summary (short)

Moody's Analytics, a century-old financial institution serving over 1,500 customers across 165 countries, transformed their approach to serving high-stakes financial decision-making by evolving from a basic RAG chatbot to a sophisticated multi-agent AI system on AWS. Facing challenges with unstructured financial data (PDFs with complex tables, charts, and regulatory documents), context window limitations, and the need for 100% accuracy in billion-dollar decisions, they architected a serverless multi-agent orchestration system using Amazon Bedrock, specialized task agents, custom workflows supporting up to 400 steps, and intelligent document processing pipelines. The solution processes over 1 million tokens daily in production, achieving 60% faster insights and 30% reduction in task completion times while maintaining the precision required for credit ratings, risk intelligence, and regulatory compliance across credit, climate, economics, and compliance domains.

## Overview and Business Context Moody's Analytics represents a compelling case study of how a 100-year-old financial institution successfully deployed production-grade generative AI and multi-agent systems to revolutionize their risk intelligence services. The presentation was delivered by Samuel Baruffi (Principal Solutions Architect at AWS) and Dennis Clement (Managing Director of Engineering and Architecture for Moody's Digital Content and Innovation), providing both vendor and customer perspectives on the technical implementation. Moody's serves over 1,500 customers across 165 countries, including 97% of the Fortune 100, providing risk intelligence across multiple domains: credit ratings, climate risk, economics, and regulatory compliance. Their customer base includes 2,600 commercial banks processing loan originations, 1,900 asset managers making portfolio allocation decisions, and 800+ insurance companies running regulatory stress tests. The stakes are extraordinarily high—customers make billion-dollar decisions based on Moody's analysis, and the company's credit decisions can move markets. This context establishes why 99% accuracy is insufficient and why their AI systems require production-grade reliability and precision. The fundamental challenge Moody's faced was serving diverse customer needs across a complex data universe with four core pillars: ratings, research and insights, data and information, and decision solutions. They manage decades of research documents, credit opinions, sector outlooks, and operate Orbis (one of the largest databases of company/entity data with 600 million entities). The complexity extends to customers wanting to combine Moody's proprietary data with their own unstructured documents, creating a need for seamless integration of multiple knowledge sources. ## Evolution Journey: From RAG to Multi-Agent Systems Moody's generative AI journey began in December 2023 with the deployment of their "Research Assistant," a RAG-based chatbot application. While users appreciated getting answers grounded in real research, the system quickly hit limitations when handling complex queries requiring credit risk comparisons across multiple companies, financial metrics analysis, and cross-referencing with sector research and news. The single-context-window approach suffered from context switching penalties, shallow expertise across domains, and performance degradation when attempting to be an expert in everything. A critical turning point came in August 2024 when they introduced PDF upload capability, allowing customers to integrate their own documents into Moody's intelligence systems. This feature exposed the severe challenges of processing unstructured financial data—ten-k documents, annual reports, earnings reports, and regulatory filings containing hundreds of pages with complex tables, charts, footnotes, and inconsistent layouts. The team recognized that approximately 80% of financial services data is unstructured, but only 20% of organizations successfully leverage it. By late 2025, Moody's deployed a full multi-agent orchestration system with specialized workflows, custom orchestrators, and task-specific agents. The evolution wasn't just about better prompts—it was about better context boundaries. Dennis Clement emphasized that they shifted from "prompt engineering" to "context engineering," recognizing that multi-agent architectures require precise context boundaries, specialized domain expertise, and elimination of cross-domain interference. ## Architectural Principles and Design Decisions Moody's architecture is built on five fundamental pillars that reflect production-grade LLMOps thinking: **Serverless-First Architecture**: Given the spiky nature of financial markets (where credit changes can drive 50X traffic increases instantly), Moody's built their agentic systems on serverless foundations for automatic scaling and cost efficiency. This architectural decision enables them to handle massive variations in demand without manual intervention or over-provisioning. **Tools as Essential Building Blocks**: They standardized on AWS Lambda functions to implement tools—discrete, single-purpose operations that fetch data or perform calculations. Lambda's stateless nature allows multiple workflows and agents to utilize the same tools concurrently while scaling automatically. Currently, they maintain approximately 80 tools serving their multi-agent ecosystem. **Two-Tier Agent Architecture**: Moody's distinguishes between simple and complex agents. Simple agents consist of system prompts, curated tool sets, and validation steps, all defined in JSON objects that can be orchestrated dynamically. Complex agents are custom-built software combining tools, proprietary datasets, and code, deployed as ECS containers when state management or long-running tasks (beyond Lambda's 15-minute limit) are required. **Custom Orchestrator as the Brain**: Their custom-built orchestration system (running on ECS) interprets JSON-formatted workflows containing tools, agents, and prompts. The orchestrator handles complexity through intelligent parallelization of independent steps while respecting dependencies, error handling, retry logic for LLM throttling, and cost optimization. What started as a system designed for 20-step workflows now handles customer workflows exceeding 400 steps, with execution times ranging from minutes to over 15 minutes. **Model Flexibility and Selection**: Every agent, step, and tool can specify its preferred LLM, preventing vendor lock-in and enabling optimization. One agent might use a reasoning model for complex analysis while another uses a small language model for simple computations. This granular model selection supports their strategy of testing and validating models in isolated contexts. ## Workflow Designer and Customer Empowerment Before OpenAI's agent builder and most orchestration platforms existed, Moody's built a visual workflow designer enabling users to orchestrate Moody's expertise into repeatable patterns. The system recognizes that users weren't just asking questions—they were forcing chain-of-thought reasoning through chat interfaces by breaking down complex analytical tasks into sequences. The workflow designer allows both Moody's teams and customers to visually stitch together specialized intelligence, creating structured outputs like charts, graphs, and tables. This represents true customer empowerment, moving from "ask a question, get an answer" to "orchestrate specialist intelligence." Customers now build workflows ranging from simple 20-step processes to extraordinarily complex 400-step analytical pipelines. The system demonstrates that production LLMOps must accommodate not just what designers anticipate but what users actually need when solving real-world problems. ## The Unstructured Data Challenge and PDF Processing Pipeline Dennis Clement devoted significant attention to what he called "the archaeological dig problem"—extracting insights from complex financial PDFs where information is scattered across hundreds of pages. Table headers appear on one page while data spans subsequent pages, charts and images embed critical information, footnotes aren't consistently positioned, and layouts vary wildly across documents. A single misplaced decimal could represent catastrophic consequences for customers making multi-million dollar decisions. Moody's candidly shared their failures across four approaches: **Basic Python Libraries**: Tools extracted text quickly and cheaply but destroyed all context, essentially "throwing a 200-page document into a blender." Verdict: Failed due to loss of structural context. **Custom Parsing Algorithms**: They attempted to understand document hierarchy through bounding boxes and section grouping. While innovative, it couldn't scale across diverse document types. Verdict: Failed to scale. **Multi-Modal Foundation Models**: Vision models showed promise, performing "pretty well" at understanding PDFs as humans would, but struggled with complex tables and layouts. More critically, the approach proved prohibitively expensive at scale. Verdict: Failed due to cost and insufficient accuracy for production. **Million-Token Context Windows**: Large context windows handled entire documents but suffered from degradation as documents grew and proved expensive to scale. Verdict: Failed on cost and performance degradation. These failures led to a breakthrough insight: not all pages are created equal. A text-heavy narrative page requires different processing than a page dominated by complex tables or charts. Moody's built an intelligent page classification system as an upfront analysis step: - **Text-dominant pages**: Routed to Bedrock LLM for OCR conversion to markdown for easier querying - **Table-dominated pages**: Processed through AWS Bedrock Data Automation (BDA), which became "absolutely essential" and "a game changer" for complex financial table extraction - **Charts and images**: Handled by vision models to create queryable metadata stored in vector databases This multi-modal pipeline approach, with intelligent routing based on content type, finally unlocked scalable PDF processing for production use. ## Agentic Retrieval: Beyond Traditional Search Moody's recognized that traditional keyword search, semantic search, and even hybrid approaches with re-ranking failed to handle queries where information is scattered across documents. Consider a query seeking a company's business units, revenue by unit, and sector analysis from an annual report. Business units might exist in tables spanning multiple pages, revenue in separate tables, and critical context buried in footnotes—single-shot vector search with top-K retrieval simply cannot handle this complexity. Their solution: agentic retrieval that mimics human document navigation. The system receives a user query, creates a plan by decomposing the query into search strategies, executes multiple searches across the document, reflects on whether retrieved information answers the question, and continues iterating until satisfied. The final output includes individual chunks with proper citations. This "intelligent document navigation" became a tool in their toolkit, available to any workflow or agent needing to pull information from complex documents. ## AWS Infrastructure and Services Moody's leverages multiple AWS services to support their production multi-agent systems: **Amazon Bedrock**: The core platform providing access to multiple foundation models from different providers in a serverless fashion. Moody's uses Bedrock's model flexibility to avoid vendor lock-in and select appropriate models per task. **Bedrock Knowledge Base**: Provides fully managed end-to-end RAG workflow, handling document ingestion pipelines from raw PDFs in S3 through parsing, chunking, embedding, and storage in vector databases. Moody's can customize chunking strategies through Lambda functions and choose from multiple embedding models (Amazon Titan, Amazon Nova multi-modal, Cohere) or host open-source models on SageMaker. **Bedrock Data Automation (BDA)**: Critical for extracting insights from complex financial documents with tables, charts, and mixed layouts. BDA supports multi-modal inputs (audio, video, image, documents) and provides extraction capabilities (summary, text, fields) via API. Moody's reports significantly reduced hallucination and improved accuracy versus traditional LLM approaches, with better price-performance. **Cross-Region Inference**: To handle capacity needs for high-throughput multi-agent systems, Moody's uses Bedrock's global cross-region inference (automatically routing to available regions using AWS backbone network) and geographic cross-region inference (routing within specific geographies for data residency compliance). This infrastructure supports processing over 1 million tokens daily. **AWS Lambda**: Hosts their 80+ tools as single-purpose, fast, stateless functions that scale automatically. **Amazon ECS**: Runs complex agents and the custom orchestrator requiring state management and long-running execution beyond Lambda limits. **Vector Databases**: Moody's can choose from multiple vector database options including OpenSearch Serverless, partner solutions (Pinecone, Redis), PostgreSQL-based options (Aurora PG Vector, RDS PG Vector), and the newly announced S3 vectors offering 90% price performance improvement. ## Production Metrics and Business Impact The system is fully operational in production with impressive metrics: - **80+ tools** in the ecosystem - **100+ workflows** supporting diverse use cases - **Many specialized task agents** across different domains - **Over 1 million tokens processed daily** - **60% faster insights** for users - **30% reduction in task completion times** - **Workflows scaling from 20 to 400+ steps** in complexity Dennis Clement emphasized repeatedly: "This isn't a demo, this is in production today, satisfying our customers' needs." The system handles real commercial banks processing $500 million loan decisions and asset managers rebalancing $2 billion portfolios. ## Terminology and Mental Models Moody's established clear definitions to align their teams and avoid confusion when executives request to "ingest all documents and ask any question with 100% accuracy by tomorrow": **Tool**: A system performing a specific task, returning context to an LLM—a discrete process like fetching data or performing calculations, operating in isolation. **Agent**: An LLM autonomously choosing tools, operating in a loop (humorously described as "a for-loop with better PR"), determining when its task is complete. **Workflow**: A deterministic orchestration with predefined sequences of steps coordinating tools and agents to produce consistent outputs. This clarity helps distinguish between building blocks and prevents over-promising or misunderstanding system capabilities. ## Challenges, Trade-offs, and Future Directions While the presentation highlights successes, several challenges and honest assessments emerge: **Custom vs. Managed Services**: Moody's acknowledges they were "incredibly fast to market" by custom-building solutions when commercial options didn't exist. However, after attending AWS re:Invent, they recognize opportunities to replace custom code with managed services like Bedrock Agent Core, reducing technical debt while maintaining capabilities. **PDF Processing Remains Difficult**: Despite their multi-modal pipeline breakthrough, Dennis admits "we are still tackling it and we're still fighting that fight." Complex financial documents continue challenging even sophisticated systems. **Cost Management**: Multiple approaches (vision models, large context windows) showed promise but failed production viability due to cost at scale. Orchestrator complexity includes optimization for cost alongside performance and reliability. **Scaling Beyond Initial Design**: The orchestrator originally capped at 20 steps, assuming that would suffice. Customer workflows reaching 400+ steps demonstrates that production systems must accommodate unanticipated usage patterns. **Looking Forward**: Moody's is evolving toward exposing their intelligence through Model Context Protocol (MCP) and "smart APIs," allowing customers not just to consume intelligence but to build with it. They're evaluating AWS Agent Core primitives including: - **Agent Core Runtime**: Serverless compute with session isolation at VM level, supporting any framework - **Agent Core Gateway**: Managed service for centralizing tool management with remote procedure servers - **Agent Core Identity**: Handling both inbound authentication (user access) and outbound authentication (agent actions on user behalf), integrating with existing identity providers - **Agent Core Observability**: Collecting every agent step, LLM call, tool invocation, and reasoning chain for troubleshooting and regulatory compliance - **Agent Core Memory**: Managed solution for short-term and long-term memory, automatically extracting preferences and semantic information ## Critical Assessment and Balanced Perspective While this case study demonstrates impressive production deployment of multi-agent AI systems, several aspects merit critical consideration: **Vendor Influence**: As a joint presentation between AWS and Moody's at an AWS conference, the content naturally emphasizes AWS services and successes. The repeated failures with alternative approaches may reflect genuine technical challenges but could also serve to validate AWS solution choices. **Complexity Trade-offs**: The architecture is undeniably complex—custom orchestrators, 80+ tools, 100+ workflows, multiple agent types, sophisticated document processing pipelines. While this delivers powerful capabilities, it also represents significant engineering investment and ongoing maintenance burden. Organizations should carefully evaluate whether simpler approaches might suffice for their specific needs before committing to this level of complexity. **Production vs. Accuracy Claims**: While Moody's emphasizes their need for extremely high accuracy in a regulated industry, the presentation doesn't provide specific accuracy metrics, error rates, or comparative benchmarks. The claim of processing "over 1 million tokens daily" is notable but doesn't directly speak to accuracy, hallucination rates, or customer satisfaction metrics. **Unstructured Data Reality**: Dennis's candid admission that PDF processing "is difficult and we are still tackling it" is refreshingly honest but also indicates this remains an unsolved problem even for well-resourced teams. Organizations should expect significant ongoing effort in this area rather than viewing it as a solved challenge. **Cost Transparency**: While cost is mentioned as a constraint that eliminated certain approaches, the presentation doesn't provide concrete cost figures, ROI calculations, or total cost of ownership for running this system at scale processing millions of tokens daily. **Agent Hype vs. Reality**: Dennis's characterization of agents as "a for-loop with better PR" provides valuable perspective against inflated expectations. This honesty helps ground the discussion in engineering reality rather than marketing hyperbole. That said, the case study demonstrates genuine production deployment solving real business problems for high-stakes financial decision-making. The metrics around speed improvements (60% faster insights) and efficiency gains (30% reduction in task completion) suggest meaningful business impact. The evolution from chatbot to multi-agent system reflects thoughtful iteration based on user needs rather than chasing technology trends. The willingness to share failures alongside successes provides valuable learning for the broader LLMOps community. ## Key Takeaways for LLMOps Practitioners This case study offers several valuable lessons for organizations building production LLM systems: **Context engineering over prompt engineering**: The shift from optimizing prompts to optimizing context boundaries through specialized agents represents a fundamental architectural insight applicable beyond financial services. **Heterogeneous document processing**: The realization that different content types require specialized processing rather than one-size-fits-all solutions is critical for handling real-world unstructured data. **Flexible model selection**: Allowing each agent and tool to specify its optimal model prevents lock-in and enables continuous optimization as models evolve. **Serverless for variable workloads**: Financial services exhibit extreme demand variability; serverless architecture proved essential for handling 50X traffic spikes without over-provisioning. **Workflow empowerment**: Moving beyond chatbots to visual workflow designers that empower users to orchestrate AI capabilities addresses real user needs that simple question-answering cannot satisfy. **Production requires orchestration**: Moving beyond simple RAG to multi-agent systems demands sophisticated orchestration handling parallelization, error recovery, retry logic, and cost optimization—capabilities rarely needed in demos but critical in production. **Agentic retrieval for complex queries**: Simple vector search with re-ranking fails when information is scattered across large documents; agentic retrieval that plans, executes, and reflects on searches better mimics human document navigation. The Moody's case study ultimately demonstrates that deploying production-grade multi-agent AI systems in regulated, high-stakes environments is achievable but requires sophisticated engineering, careful architectural decisions, iterative refinement based on real usage, and ongoing investment in solving hard problems like unstructured data processing. Organizations should approach such implementations with realistic expectations about complexity, cost, and the iterative nature of achieving production-quality results.

Start deploying reproducible AI workflows today