## Overview
FloQast is an accounting software company founded in 2013 that serves over 2,800 organizations across various industries and regions. The company specializes in streamlining accounting operations through automation, including automated reconciliations and close process management tools. This case study details how FloQast built an AI-powered accounting transformation solution using Anthropic's Claude 3.5 Sonnet on Amazon Bedrock to address complex, custom aspects of financial processes that typically require manual intervention.
The core problem FloQast aims to solve is what they call the "final 20%" of accounting work—the intricate, bespoke aspects of accounting that are highly specific to each organization. As businesses scale, the complexity of accounting increases exponentially with growth and diversification. Maintaining accuracy and compliance across hundreds or thousands of simultaneous transactions becomes a monumental challenge. FloQast's solution uses advanced machine learning and natural language commands to enable accounting teams to automate reconciliation with high accuracy and minimal technical setup.
## Production LLM Architecture and Components
### AI Transaction Matching Product
The first major product is AI Transaction Matching, which automates the matching and reconciliation of transactions across multiple data sources. The workflow operates as follows:
Transaction data is gathered from bank statements and enterprise resource planning (ERP) systems. An accountant selects specific transactions in both systems and triggers the "Generate AI Rule" function. Based on the selected transactions, the LLM generates natural language text describing the matching rule. The accountant can either accept or edit this generated text, then saves and applies it to generate a coded rule format that finds additional matches automatically.
Key capabilities include AI-driven matching across multiple data sources, flexible rule creation using natural language, exception handling for unmatched transactions, comprehensive audit trails for compliance, high-volume processing suitable for businesses of all sizes, and multi-source integration with various financial systems.
### AI Annotations Product
The second major product is AI Annotations, which automates document annotation and review for compliance and audit processes. This has a more complex architecture involving multiple AWS services working in concert.
The architecture includes user authentication and authorization as initial steps, followed by document upload to secure Amazon S3 buckets. Amazon Textract handles document processing, extracting data with encryption both in transit and at rest. The raw extracted data is stored in encrypted S3 buckets, then a data sanitization workflow runs using AWS Step Functions with AWS Lambda functions. Sanitized data is written to encrypted MongoDB, and job status is polled and updated in MongoDB.
When users initiate the annotation process, the application logic consumes data from MongoDB and provides it to Anthropic's Claude 3.5 Sonnet on Amazon Bedrock. The LLM runs audit rules against the extracted data and generates annotations for each audit rule, including pass/fail details. Importantly, annotation results are filtered using Amazon Bedrock Guardrails to enhance content safety and privacy in the generative AI application.
## LLMOps Technology Choices and Rationale
FloQast selected Amazon Bedrock for several strategic reasons related to production LLM operations. The platform offers unmatched versatility, feature sets, and access to scalable AI models from top-tier providers like Anthropic. A key advantage noted is that Amazon Bedrock is serverless, eliminating the need to manage infrastructure while securely integrating and deploying generative AI capabilities. This serverless approach handles spiky traffic patterns and enables features like cross-Region inference for scalability and reliability across AWS Regions.
The choice of Anthropic's Claude 3.5 Sonnet specifically was based on evaluation results showing it provided the best performance for FloQast's use cases. The model's advanced reasoning and contextual understanding was deemed necessary for handling complex financial workflows.
### RAG Over Fine-Tuning
A significant architectural decision was to use Retrieval Augmented Generation (RAG) with few-shot classification on data collected on the user's behalf, rather than fine-tuning the LLM. FloQast explicitly states they don't fine-tune the model as a consumer. This design mechanism was chosen because it produces a higher level of accuracy for this use case, offers a better security model that is understood by FloQast's customers, and provides ease of use as a developer. This approach removes the overhead of fine-tuning while still enabling customization per customer through their specific data.
### Amazon Bedrock Agents for Orchestration
Amazon Bedrock Agents is described as a "game changer" for FloQast, providing an intelligent orchestration layer for automating accounting workflows. The agents enable several key capabilities:
**Instruction Handling and Task Automation**: Amazon Bedrock Agents enables FloQast to submit natural language instructions that the AI interprets and executes autonomously. This is central to the natural language rule creation feature in Transaction Matching.
**Session and Memory Management**: Session attributes and promptSessionAttributes are passed between sessions related to a single workflow. FloQast notes that most user requests can be singular to a session, suggesting relatively stateless interactions for many use cases.
**Code Generation with Business Understanding**: Amazon Bedrock Agents offers secure code interpretation capabilities and flexible configuration options. Agents can be tailored to the correct persona and business context while operating within a protected test environment. Accountants can submit natural language instructions and input data, which is processed in a controlled manner with security best practices. The generated code is tested within an isolated secure environment with appropriate technical oversight and guardrails.
**Data Integration and Output Handling**: Information is passed from upstream integrated financial systems through Amazon Bedrock Agents, allowing FloQast to automate data retrieval and transformation tasks.
**Multi-Step Task Orchestration**: Amazon Bedrock agents handle multi-step tasks by orchestrating complex workflows. After retrieving data from a financial system, the data is passed to the agent, which runs necessary calculations, generates output code, and presents results for user approval—all in one automated process. This orchestration is especially useful in accounting where multiple steps must be completed in the correct sequence to maintain compliance and accuracy.
## Safety and Guardrails
The case study mentions the use of Amazon Bedrock Guardrails to filter annotation results, enhancing content safety and privacy in the generative AI application. This is positioned as a key step in the AI Annotations workflow, occurring after the LLM generates its outputs. While the specific guardrail configurations are not detailed, this represents an important production safety mechanism.
## Reported Results
FloQast claims the following improvements from their AI-powered solution:
- 38% reduction in reconciliation time
- 23% decrease in audit process duration and discrepancies
- 44% improvement in workload management
These metrics suggest meaningful productivity gains, though as with any vendor-provided statistics, they should be considered representative rather than universal. The specific conditions under which these improvements were measured are not detailed in the case study.
## Critical Assessment
This case study is published on the AWS blog and co-authored by both AWS and FloQast personnel, so it naturally emphasizes the positive aspects of the AWS/Amazon Bedrock platform. The claimed performance improvements are significant but lack context about baseline measurements, sample sizes, or customer diversity in these metrics.
The architectural decisions appear sound from an LLMOps perspective. Using RAG with few-shot classification rather than fine-tuning is a pragmatic choice that reduces operational complexity while enabling per-customer customization through their own data. The serverless approach through Amazon Bedrock eliminates infrastructure management overhead, which is valuable for a SaaS company focused on accounting rather than ML infrastructure.
The multi-step orchestration through Amazon Bedrock Agents addresses a real challenge in production LLM systems—coordinating complex workflows that require multiple LLM calls and integrations with external systems. The emphasis on secure code execution environments and proper guardrails demonstrates awareness of production safety requirements.
However, the case study lacks detail on several LLMOps concerns: monitoring and observability strategies, cost management at scale, latency characteristics, fallback mechanisms for model failures, testing and evaluation methodologies, and how they handle model version updates or potential model behavior changes over time. These would be valuable additions for a complete picture of their production LLM operations.