Finance
Amazon Finance
Company
Amazon Finance
Title
AI Assistant for Financial Data Discovery and Business Intelligence
Industry
Finance
Year
2025
Summary (short)
Amazon Finance developed an AI-powered assistant to address analysts' challenges with data discovery across vast, disparate financial datasets and systems. The solution combines Amazon Bedrock (using Anthropic's Claude 3 Sonnet) with Amazon Kendra Enterprise Edition to create a Retrieval Augmented Generation (RAG) system that enables natural language queries for finding financial data and documentation. The implementation achieved a 30% reduction in search time, 80% improvement in search result accuracy, and demonstrated 83% precision and 88% faithfulness in knowledge search tasks, while reducing information discovery time from 45-60 minutes to 5-10 minutes.
Amazon Finance successfully deployed an AI-powered assistant in production to solve critical data discovery and business intelligence challenges faced by financial analysts across the organization. This case study demonstrates a comprehensive LLMOps implementation that combines multiple AWS services to create an enterprise-grade solution for natural language interaction with financial data and documentation. ## Business Problem and Context Amazon Finance analysts were struggling with mounting complexity in financial planning and analysis processes when working with vast datasets spanning multiple systems, data lakes, and business units. The primary challenges included time-intensive manual browsing of data catalogs, difficulty reconciling data from disparate sources, and the inability to leverage historical data and previous business decisions that resided in various documents and legacy systems. Traditional keyword-based searches failed to capture contextual relationships in financial data, and rigid query structures limited dynamic data exploration. The lack of institutional knowledge preservation resulted in valuable insights and decision rationales becoming siloed or lost over time, leading to redundant analysis and inconsistent planning assumptions across teams. ## Technical Architecture and LLMOps Implementation The production solution implements a sophisticated Retrieval Augmented Generation (RAG) architecture that demonstrates several key LLMOps principles and practices. At the core of the system is Amazon Bedrock, which provides the LLM serving infrastructure using Anthropic's Claude 3 Sonnet model. The choice of Claude 3 Sonnet was made specifically for its exceptional language generation capabilities and ability to understand and reason about complex financial topics, which is critical for production deployment in a high-stakes domain like finance. The retrieval component leverages Amazon Kendra Enterprise Edition Index rather than Amazon OpenSearch Service or Amazon Q Business. This architectural decision reflects important LLMOps considerations around accuracy, maintainability, and operational overhead. Amazon Kendra provides out-of-the-box natural language understanding, automatic document processing for over 40 file formats, pre-built enterprise connectors, and intelligent query handling including synonym recognition and refinement suggestions. The service automatically combines keyword, semantic, and vector search approaches, whereas alternatives would require manual implementation and ongoing maintenance of these features. The system architecture follows a multi-tier approach typical of production LLM deployments. User queries are processed through a Streamlit frontend application, which sends queries to the Amazon Kendra retriever for relevant document retrieval. Amazon Kendra returns relevant paragraphs and document references to the RAG solution, which then uses Anthropic's Claude through Amazon Bedrock along with carefully crafted prompt templates to generate contextual responses. The responses are returned to the Streamlit UI along with feedback mechanisms and session history management. ## Prompt Engineering and Template Design The case study highlights the importance of prompt engineering in production LLM systems. The team implemented structured prompt templates that format user queries, integrate retrieved knowledge, and provide specific instructions and constraints for response generation. The example prompt template demonstrates best practices for production systems by explicitly instructing the model to acknowledge when it doesn't know an answer rather than hallucinating information, which is particularly crucial in financial applications where accuracy is paramount. The prompt template structure follows the pattern of providing context from retrieved documents, followed by the user question, and explicit instructions about how to handle uncertain information. This approach helps ensure that the LLM's responses are grounded in the retrieved knowledge base rather than relying solely on the model's training data, which is essential for maintaining accuracy and reliability in production financial applications. ## Deployment Architecture and Scalability The frontend deployment architecture demonstrates production-ready LLMOps practices with emphasis on scalability, security, and performance. The system uses Amazon CloudFront for global content delivery with automatic geographic routing to minimize latency. Authentication is handled through AWS Lambda functions that verify user credentials before allowing access to the application, ensuring enterprise security standards are maintained. The backend is deployed using AWS Fargate for containerized execution without infrastructure management overhead, combined with Amazon Elastic Container Service (ECS) configured with automatic scaling based on Application Load Balancer requests per target. This serverless approach allows the system to scale dynamically based on demand while minimizing operational overhead, which is crucial for production LLM applications that may experience variable usage patterns. ## Evaluation Framework and Production Monitoring The implementation includes a comprehensive evaluation framework that demonstrates mature LLMOps practices around testing and monitoring. The team implemented both quantitative and qualitative assessment methodologies to ensure the system meets the high standards required for financial applications. The quantitative assessment focused on precision and recall testing using a diverse test set of over 50 business queries representing typical analyst use cases, with human-labeled answers serving as ground truth. The evaluation framework distinguished between two main use cases: data discovery and knowledge search. Initial results showed data discovery achieving 65% precision and 60% recall, while knowledge search demonstrated 83% precision and 74% recall. These metrics provided a baseline for ongoing system improvement and represented significant improvements over previous manual processes that had only 35% success rates and required multiple iterations. The qualitative evaluation centered on faithfulness metrics, using an innovative "LLM-as-a-judge" methodology to evaluate how well the AI assistant's responses aligned with source documentation and avoided hallucinations. This approach is particularly relevant for production LLM systems where output quality and reliability must be continuously monitored. The faithfulness scores of 70% for data discovery and 88% for knowledge search provided concrete metrics for system reliability that could be tracked over time. ## User Feedback Integration and Continuous Improvement The production system includes built-in feedback mechanisms that enable continuous improvement, a critical aspect of LLMOps. User feedback on responses is stored in Amazon S3, creating a data pipeline for analyzing system performance and identifying areas for improvement. This feedback loop allows the team to understand where the system succeeds and fails in real-world usage, enabling iterative improvements to both the retrieval system and the generation components. The user satisfaction metrics (92% preference over traditional search methods) and efficiency improvements (85% reduction in information discovery time) demonstrate the business impact of the LLM deployment. These metrics serve as key performance indicators for the production system and help justify continued investment in the LLMOps infrastructure. ## Operational Considerations and Challenges The case study reveals several important operational considerations for production LLM systems. The team identified that the lack of rich metadata about data sources was a primary factor limiting system performance, particularly in data discovery scenarios. This insight led to organizational changes around metadata collection practices, demonstrating how LLM deployments can drive broader data governance improvements. The system's performance varied significantly between use cases, with knowledge search substantially outperforming data discovery tasks. This variation highlights the importance of understanding how different types of queries interact with RAG systems and the need for potentially different optimization strategies for different use cases within the same production system. ## Security and Compliance The implementation demonstrates enterprise-grade security practices essential for production LLM systems handling sensitive financial data. The system maintains enterprise security standards through Amazon Kendra's built-in data protection and compliance features, authentication mechanisms through AWS Lambda functions, and secure document storage in Amazon S3 buckets with appropriate access controls. The choice of AWS-managed services for the core LLM infrastructure (Amazon Bedrock) rather than self-hosted models reflects important production considerations around security, compliance, and operational overhead. Using managed services allows the team to focus on application-level concerns while relying on AWS for infrastructure security, model updates, and compliance certifications. This case study represents a comprehensive example of production LLMOps implementation that addresses the full lifecycle from problem identification through deployment, monitoring, and continuous improvement, while maintaining the security and reliability standards required for enterprise financial applications.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.