## Overview
This case study is drawn from a panel discussion at a Google Cloud event featuring executives from four major financial institutions: Discover Financial Services, Macquarie Bank (Australia), Scotia Bank (Canada), and Intesa Sanpaolo (Italy). The discussion covers their respective journeys in cloud transformation, data platform development, and AI/generative AI implementation. While each bank shares insights, the most detailed LLMOps case study comes from Discover Financial Services, which describes a production deployment of generative AI to assist customer service agents.
## Discover Financial Services: Agent Assistance with Generative AI
### The Problem
Discover identified a significant challenge in their customer service operations. As digital banking has evolved, customers increasingly self-serve for simple tasks like checking balances or payment due dates through mobile apps and web interfaces. This has fundamentally changed the nature of calls that reach human agents—only the difficult, complex questions now come through to the contact center.
This shift has made the job of customer service agents significantly more difficult. When handling complex inquiries, agents must navigate through extensive procedure documentation to find the correct steps and information. The traditional search process was measured in minutes, often requiring agents to put customers on hold with phrases like "can I put you on hold for a minute"—a poor customer experience that conflicts with Discover's brand promise of award-winning customer service.
The core issue was that agents didn't want to find documents; they wanted to find solutions. Traditional search returned multiple multi-page documents that agents had to manually parse through, making it difficult to quickly locate the relevant information.
### The Solution Architecture
Discover partnered with Google Cloud to implement a generative AI solution using Vertex AI and Gemini. The architecture leverages Retrieval Augmented Generation (RAG) to ground the AI's responses in Discover's actual procedure documentation. Key technical aspects include:
- **RAG Implementation**: The system is tuned directly with Discover's internal procedure documents, ensuring that generated summaries are grounded in authoritative sources rather than general knowledge.
- **Direct Document Linking**: When the AI provides a summarized answer, it includes direct links to the source documents, allowing agents to quickly drill down to the original material if needed.
- **In-Document Navigation**: The links don't just point to documents—they navigate agents to the specific relevant section within the document, further reducing time to resolution.
- **Vertex AI Integration**: As existing Google Cloud customers, Discover leveraged their platform investment, with Gemini integrated into Vertex AI for a cohesive development and deployment experience.
### Governance and Risk Framework
Before any implementation work began, Discover established what they call a "Generative AI Council" (or "clo"). This organizational framework was crucial for a regulated financial services company and includes participants from:
- Legal
- Compliance
- Information Security
- Analytics
- Engineering
- Business lines
The council serves multiple purposes. It establishes organizational comfort with AI boundaries, creates intake processes for ideas and experimentation, and builds a risk framework that helps prioritize use cases by risk level. One key policy example: employees cannot use open/public AI tools freely, but the organization can build enterprise capabilities with proper security controls.
The risk scaling approach allows Discover to start with lower-risk use cases and progressively tackle more complex scenarios. This methodical approach is particularly important in regulated industries where customer data and financial decisions are involved.
### Development and Tuning Process
What makes this case study particularly interesting from an LLMOps perspective is the acknowledgment that the traditional engineering work was the easy part:
**Engineering Phase**: The actual coding and platform integration was "measured in weeks"—a very quick process enabled by the pre-built integrations in Vertex AI.
**Model Tuning Phase**: This became the heavy lifting. Tuning the models to accurately summarize information relevant to Discover's specific procedures required extensive experimentation and iteration. The work proceeds skill group by skill group, taking procedures that are similar and tuning the system for that domain before moving to the next.
**Key Personnel**: The most important people in the tuning process are not engineers but "knowledge workers"—expert agents and the technical writers who create the procedure documents themselves. These domain experts run iterative cycles to evaluate and improve the quality of AI-generated answers.
### Human-in-the-Loop Evaluation
The production system incorporates continuous human evaluation mechanisms:
- **Thumbs Up/Down Feedback**: Agents receiving AI-generated answers can provide basic binary feedback on answer quality.
- **Qualitative Feedback**: Beyond simple ratings, agents provide detailed feedback on which types of questions produce high-quality answers versus which need improvement.
- **Automated Evaluation Tools**: The team built automated tools that leverage domain knowledge with "golden answers" (known-good responses) to evaluate AI outputs. This allows for scalable quality assessment while maintaining alignment with human judgment.
This feedback loop creates a continuous improvement cycle where the system gets progressively better at handling the specific types of questions that agents face.
### Results
The primary metric Discover tracked was time to information. The results were significant:
- **70% reduction in search time**
- Original search times were measured in minutes
- New solution delivers answers in seconds
- Agents can now stay in the conversation flow rather than putting customers on hold
Beyond the quantitative results, the solution provides qualitatively better search results—agents not only find information faster, but the RAG-based approach surfaces more relevant results than the historical keyword-based search.
### Phased Initial Use Case
Interestingly, Discover started with an even simpler use case before deploying the agent-facing system. They used generative AI to help technical writers rewrite and improve the procedure documents themselves. This approach maintained full human oversight:
- Technical writers use AI assistance to draft improved procedures
- The revised procedure goes through the same legal review by human attorneys
- It undergoes the same compliance review by human compliance officers
- No generative AI assistance in the review phase—purely human judgment
This layered approach allowed Discover to build organizational confidence with AI in a lower-risk context before moving to the agent-facing production deployment.
## Other Bank Perspectives on AI/LLMOps
### Scotia Bank
Scotia Bank shared an important lesson about data quality and AI adoption. They deployed a generative AI system on top of their knowledge base for contact center agents. Initial answer quality was around 33%, but improved to approximately 96% after the contact center team took ownership of data quality.
The key insight was that the improvement came not from better AI, but from better data. The team:
- Added context to acronyms
- Updated titles and content
- Removed outdated and incorrect content
This work was done organically by the business team, motivated by the visible connection between data quality and AI output quality. The speaker noted this would traditionally have been a "$10 million, 12-month consulting engagement" but was accomplished in months by motivated internal teams.
Scotia Bank also emphasized the productivity gains for data scientists when data is properly organized and accessible in the cloud, reducing the traditional 80% time spent on data wrangling.
### Macquarie Bank
Macquarie Bank (96.4% public cloud) discussed their customer-facing AI implementations:
- **Cash flow prediction**: Using customer transaction history to predict future cash flows
- **Variable rate mortgage review**: Customers can request rate reviews through the mobile app, with AI analyzing recent transaction history to determine if they qualify for better rates
- **Transaction personalization**: The "humble transaction" now supports hashtags, photos, documents, and receipts, enabling natural language search (e.g., #tax for tax-related transactions)
They emphasized keeping "human in the loop" for production AI systems to manage risk appropriately.
### Intesa Sanpaolo
The Italian bank discussed their structured AI program targeting €500 million in bottom line benefits by 2025, with use cases split across cost reduction (25%), risk reduction (25%), and revenue increase (50%). They established four governing principles for all AI use cases:
- Explainability
- Fairness
- Human in the loop
- Quality of data
They are proceeding cautiously with generative AI specifically, ensuring they understand the full implications for skills, processes, and operations before scaling.
## Key LLMOps Themes Across the Panel
Several common themes emerged across all participants that are relevant to LLMOps practitioners:
**Data Foundation is Critical**: Every speaker emphasized that AI success depends on having clean, organized, accessible data. Cloud data platforms enable both the storage and the velocity needed for AI workloads.
**Human in the Loop is Non-Negotiable**: In regulated financial services, all participants stressed the importance of human oversight, whether in model evaluation, output review, or decision approval.
**Cross-Functional Governance**: AI initiatives require buy-in and participation from legal, compliance, security, and business teams—not just technology.
**Start Small and Scale**: The successful approaches all involved starting with lower-risk use cases, proving value, and progressively tackling more complex scenarios.
**Domain Experts Drive Quality**: The most impactful contributors to AI quality are often not engineers but domain experts who understand the business context and can evaluate whether outputs make sense.
**Data Quality Ownership Shifts**: Generative AI has an unexpected benefit of making data quality issues visible and motivating business teams to own and improve their data.