Company
Ramp
Title
Building Trustworthy LLM-Powered Agents for Automated Expense Management
Industry
Finance
Year
2025
Summary (short)
Ramp developed a suite of LLM-backed agents to automate expense management processes, focusing on building user trust through transparent reasoning, escape hatches for uncertainty, and collaborative context management. The team addressed the challenge of deploying LLMs in a finance environment where accuracy and trust are critical by implementing clear explanations for decisions, allowing users to control agent autonomy levels, and creating feedback loops for continuous improvement. Their policy agent now handles over 65% of expense approvals automatically while maintaining user confidence through transparent decision-making and the ability to defer to human judgment when uncertain.
## Overview This case study from Ramp demonstrates a comprehensive approach to deploying LLM-powered agents in a production finance environment, specifically for expense management automation. The company developed a suite of agents that handle tasks like expense approvals, merchant identification, and receipt parsing, with their policy agent achieving over 65% automated approval rates. The case study is particularly valuable for its focus on building user trust in LLM systems operating in high-stakes financial contexts where accuracy and transparency are paramount. ## Problem Selection and Use Case Design Ramp's approach to LLM deployment begins with careful problem selection, identifying scenarios where LLMs can provide maximum value. Their framework evaluates problems across three dimensions: ambiguity (where simple heuristics fail), high volume (where human processing would be prohibitively time-consuming), and asymmetric upside (where automation benefits significantly outweigh occasional error costs). This strategic approach to use case selection represents a mature understanding of LLM capabilities and limitations in production environments. The expense approval process exemplifies these criteria perfectly - it involves subjective judgment calls that vary by company policy, processes hundreds or thousands of transactions, and has manageable failure modes where incorrect decisions can be caught and corrected without catastrophic consequences. This thoughtful use case selection demonstrates sophisticated LLMOps thinking about where to deploy these systems for maximum business impact. ## Transparency and Explainability Architecture A cornerstone of Ramp's LLMOps strategy is their emphasis on transparent reasoning and explainability. Rather than simply providing binary decisions, their system generates detailed explanations for every decision, linking directly to relevant sections of company expense policies. This approach serves multiple stakeholders: users can verify and understand decisions, developers gain model observability for system improvements, and the organization builds institutional knowledge about decision patterns. The technical implementation involves prompting the LLM to provide step-by-step reasoning for each decision, then structuring this output to reference specific policy sections. This creates a form of retrieval-augmented generation where the model's reasoning is grounded in verifiable company documentation. The system goes beyond simple citations by creating interactive links that allow users to immediately access the policy sections referenced in the LLM's reasoning. ## Handling Uncertainty and Model Limitations One of the most sophisticated aspects of Ramp's LLMOps implementation is their approach to uncertainty handling. Rather than forcing the LLM to make definitive decisions in all cases, they explicitly designed "escape hatches" that allow the model to express uncertainty and explain its reasoning for deferring to human judgment. This represents a mature understanding of LLM capabilities and the importance of humility in AI systems. The team specifically rejected confidence scoring approaches, recognizing that LLMs tend to provide misleadingly consistent confidence scores (typically 70-80%) regardless of actual uncertainty. Instead, they implemented a three-tier decision framework: Approve (clear policy match), Reject (clear policy violation), and Needs Review (uncertainty or edge cases). This categorical approach provides more actionable guidance to users while avoiding the false precision of numerical confidence scores. ## Context Management and Collaborative Improvement Ramp implemented a sophisticated context management system that treats policy documents as living, collaborative resources rather than static inputs. Their approach involves three key components: bringing expense policies directly into their platform (rather than relying on external PDFs), using these policies as direct context for LLM decisions, and providing tools for users to modify policies when they disagree with LLM outputs. This creates a powerful feedback loop where user corrections to LLM decisions result in policy clarifications that improve future performance. The system includes a full policy editor that allows users to update expense policies directly within the platform, ensuring that context remains current and reducing ambiguity that might confuse both AI and human decision-makers. This collaborative approach to context management represents an advanced understanding of how to continuously improve LLM systems through user feedback. ## User Agency and Autonomy Controls A critical aspect of Ramp's LLMOps strategy is their "autonomy slider" approach, which allows users to define exactly where and when agents can act independently. Rather than implementing a one-size-fits-all automation level, they leverage their existing workflow builder to let users customize agent behavior according to their comfort levels and organizational requirements. This system supports both permissive settings (where agents can act on most decisions) and restrictive settings (requiring human review for transactions above certain dollar amounts or from specific vendors). The flexibility extends to hard stops that prevent LLM involvement entirely for certain categories of expenses, recognizing that some decisions will always require human judgment. ## Progressive Trust Building The deployment strategy follows a "crawl, walk, run" approach to building user trust. Initially, the system operated in suggestion mode, providing recommendations that humans could accept or reject. This allowed users to observe the agent's decision patterns, understand its reasoning, and build confidence in its capabilities before granting it autonomous decision-making authority. This progressive approach serves multiple purposes: it allows for real-world testing of the system's accuracy, helps users understand the agent's capabilities and limitations, and creates a natural trust-building process that matches organizational comfort levels. The transition from suggestions to autonomous actions happens gradually, with users maintaining control over when and how to expand agent authority. ## Evaluation and Continuous Improvement Ramp's evaluation strategy treats evals as the equivalent of unit tests for LLM systems, recognizing them as essential for responsible system evolution. Their approach emphasizes starting with simple evaluations and gradually expanding to more comprehensive coverage as the system matures. They prioritize edge cases where LLMs are prone to errors and systematically convert user-flagged failures into test cases for future evaluation. A particularly sophisticated aspect of their evaluation approach is their recognition that user feedback can be biased or inconsistent. They discovered that finance teams tend to be more lenient than strict policy adherence would require, leading to potential model drift if user actions were always treated as ground truth. To address this, they created carefully curated "golden datasets" reviewed by their team to establish objective decision standards based solely on system-available information. ## Technical Architecture and Implementation While the case study doesn't provide extensive technical details about the underlying LLM architecture, it reveals several important implementation decisions. The system appears to use a combination of structured prompting for decision-making and reasoning generation, coupled with retrieval systems that can access and cite relevant policy sections. The integration with existing workflow management systems suggests a microservices architecture that allows the LLM agents to plug into established business processes. The system's ability to handle multiple types of financial decisions (expense approvals, merchant identification, receipt parsing) suggests either a multi-agent architecture or a flexible single-agent system that can adapt to different task types. The emphasis on deterministic rules layered on top of LLM decisions indicates a hybrid approach that combines traditional business logic with AI-powered decision-making. ## Lessons and Broader Implications This case study provides several important insights for LLMOps practitioners. First, it demonstrates the critical importance of trust-building mechanisms in production LLM systems, particularly in high-stakes environments like finance. The emphasis on transparency, explainability, and user control offers a template for responsible AI deployment that balances automation benefits with user agency. Second, the sophisticated approach to uncertainty handling and the rejection of false precision (like confidence scores) shows mature thinking about LLM capabilities and limitations. The three-tier decision framework provides a practical alternative to confidence scoring that offers more actionable guidance to users. Third, the collaborative context management approach demonstrates how to create feedback loops that improve system performance over time while maintaining user engagement and system accuracy. This goes beyond simple fine-tuning to create a dynamic system that evolves with organizational needs and user feedback. Finally, the progressive trust-building strategy and autonomy controls offer a framework for deploying LLM agents that respects user agency while maximizing automation benefits. This approach could be valuable for organizations looking to implement AI systems without overwhelming users or creating resistance to automation initiatives. The case study represents a mature approach to LLMOps that balances technical sophistication with practical usability, offering valuable insights for organizations seeking to deploy trustworthy AI systems in production environments.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.