## Overview
This case study documents Sixt's journey in deploying generative AI to transform their customer service operations through what they call "Project AIR" (AI-based Replies). Sixt is a mobility service provider with over €4 billion in annual revenue offering various products including car rental, subscription services (Sixt+), car sharing (Sixt Share), electric vehicle charging (Sixt Charge), and premium taxi services (Sixt Ride). The company operates globally across more than 100 countries, which creates significant complexity in customer service operations due to varying languages, regulations, and product offerings.
The presentation was delivered by representatives from both AWS and Sixt, including Dominic (VP of Data Science at Sixt), who provided firsthand insights into the technical implementation and business outcomes. The case study is particularly valuable because it demonstrates a complete LLMOps journey from ideation to production deployment across multiple channels, with clear metrics on both technical performance and business impact.
## Business Context and Motivation
Sixt's motivation for implementing generative AI in customer service was driven by three primary objectives: enabling profitable growth in an environment where manual scaling of customer service teams is challenging, enhancing customer experience through reduced waiting times and consistent responses across all markets, and unburdening employees from routine tasks to allow them to focus on more complex customer interactions and personal development.
The company faces particular challenges due to the variety of products they offer and the global nature of their operations. Customer inquiries range from sales-related questions about vehicle availability, to documentation requirements for international travel, to administrative requests like invoice generation for B2B customers. All of these inquiries arrive in multiple languages and need to be routed appropriately and answered with consistency regardless of which market the customer is contacting from.
## Technical Architecture and Implementation
The technical solution Sixt implemented involves multiple layers of AI-powered automation built primarily on Amazon Bedrock with Anthropic Claude models. The architecture follows a workflow pattern where incoming customer communications (initially email, later expanded to chat and messaging) flow through several processing stages.
**Classification Layer**: The first critical component is automated classification of customer inquiries. Sixt implemented a sophisticated multi-level classification system that categorizes inquiries by timing relative to the rental period (before, during, or after rental) and then further subcategorizes by specific intent (changing reservation time, documentation questions, billing inquiries, etc.). This classification is crucial both for routing inquiries to the appropriate human agents when needed and for triggering the correct automation workflows.
Initially, Sixt attempted this classification with an out-of-the-box machine learning solution that achieved only approximately 70% accuracy. This necessitated significant manual reclassification efforts, with multiple people dedicated solely to correcting misclassifications. By implementing Amazon Bedrock with Anthropic Claude models, they improved classification accuracy to over 90%, while simultaneously reducing classification costs by 70%. The presentation notes that model changes became much easier to implement with this approach, suggesting that the prompt-based classification approach offers greater flexibility than traditional ML classification models that would require retraining.
**Agent Support Layer**: Once classified, inquiries trigger various forms of agent support integrated into Salesforce Service Cloud. The system automatically extracts key information such as reservation numbers, customer sentiment indicators, and topic summarization. Simultaneously, it generates reply proposals that human agents can review and modify before sending. This represents a human-in-the-loop approach where the AI augments rather than completely replaces human judgment, which is particularly important for maintaining quality in customer-facing communications.
**Automation Layer**: For inquiries where Sixt has sufficient confidence in the quality and appropriateness of automated responses, the system can bypass human review entirely and send responses automatically. The presentation demonstrated a chatbot interaction where a customer asked multiple questions: finding the location of a Sixt station at Munich airport (which triggered retrieval of directions and a Google Maps link), documentation requirements for vehicle pickup (which retrieved knowledge base articles and provided links), and a reservation change request (which routed the customer to a self-service portal). This multi-turn interaction demonstrates the system's ability to handle context across conversation turns and route to different backend systems based on intent.
The architecture includes integration with multiple backend systems, particularly the reservation management system for handling booking modifications. The presentation emphasized that seamless integration with existing business systems was one of the critical success factors, and Sixt credited their platform team for facilitating these integrations.
**Knowledge Management**: The solution leverages Amazon Bedrock Knowledge Bases to implement retrieval-augmented generation (RAG) for answering customer questions based on Sixt's domain knowledge. The system retrieves relevant information from knowledge bases containing information about processes, policies, and procedures, then generates contextually appropriate responses. The presentation indicated that different knowledge bases are triggered based on the classification of the inquiry, allowing the system to retrieve the most relevant information for each type of question.
## Deployment Timeline and Phased Rollout
The deployment timeline provides valuable insights into the LLMOps journey. The project began in May 2023 with initial ideation and scoping discussions between Sixt and AWS. By June 2023, they had agreed on a joint prototyping engagement with AWS's prototyping team. Within six weeks, they completed a prototype of the AI reply generation use case, and remarkably, by September 2023—just four months from initial concept—they had the first use case live in production.
Importantly, this initial production deployment was on Sixt's previous customer service platform (mentioned as "no mind" in the transcript), not their target Salesforce Service Cloud platform. Sixt made a deliberate decision to deploy on the existing platform first to gain experience with the relatively new generative AI technology while they prepared for platform migration. This pragmatic approach demonstrates sophisticated change management, allowing the organization to derisk both the AI technology adoption and the platform migration by not attempting both simultaneously.
From Q4 2023 through Q1 2024, Sixt executed their migration to Salesforce Service Cloud while integrating the AIR solution. When they began expanding to additional automation use cases, they discovered that the classification accuracy was insufficient for their needs, requiring them to take a step back and strengthen the classification layer (the improvement to 90% accuracy mentioned earlier). This occurred in Q2 2024, after which they could proceed with expanded automation.
By Q1 2025, they launched messaging and chatbot capabilities, rolling these out to all corporate countries in Q2 2025. The presentation indicates they are on a cognitive platform combined with their own generative AI platform, suggesting a hybrid architecture that combines AWS managed services with custom components.
## Model Selection and Optimization
The presentation provided broader context about model selection philosophy that informed Sixt's choices. The AWS presenters emphasized that the industry has moved away from seeking "the best model" to finding "the model that fits best for that particular use case." This represents an important maturation in how organizations approach LLM deployment.
Amazon Bedrock's broad model selection was highlighted as a key factor, providing access to Amazon Nova models, Anthropic Claude models, and models from Meta, Mistral AI, and others, plus over 100 specialized models through the Bedrock Marketplace. The presentation mentioned several model optimization techniques that are relevant for production LLM systems:
**Model Distillation** was described as teaching a smaller "student" model to learn from a larger "teacher" model, allowing organizations to maintain accuracy and intelligence while achieving better price-performance through smaller, faster models. While not explicitly stated that Sixt used this technique, it was presented as a key optimization strategy for production systems.
**Prompt Caching** was mentioned as a technique to improve performance and reduce costs for applications with repeated complex prompts, caching prompt components to enable faster subsequent responses.
**Intelligent Prompt Routing** was described as automatically routing prompts to the most appropriate model based on complexity, analogous to how different parts of a human brain activate for different types of questions. A simple question like "what's your name" would route to a fast, smaller model, while complex strategic questions would route to more capable models.
The presentation emphasized that Amazon Bedrock Model Evaluation enables data-driven model selection through automated and human evaluation approaches, including using LLMs as judges to provide human-like evaluation with quantitative metrics.
## Data Strategy and RAG Implementation
Data was positioned as "the business differentiator" and "the lifeblood of a production-ready AI system." The presentation outlined three primary approaches to customizing LLMs with domain knowledge:
**Retrieval-Augmented Generation (RAG)** provides domain context without changing the underlying model, which Sixt's implementation clearly leverages through Amazon Bedrock Knowledge Bases. The managed RAG workflow handles data retrieval, prompt augmentation, and provides source attribution for responses.
**Fine-tuning and Continued Pre-training** involves actually modifying model parameters and weights using labeled or unlabeled domain data to improve accuracy, though the presentation didn't explicitly state whether Sixt employed these techniques.
**Agentic AI** enables models to not only retrieve information but take actions by calling APIs and functions, which Sixt clearly implemented for reservation modifications and other workflow automation.
The presentation highlighted Amazon Bedrock Data Automation for handling multimodal unstructured data, particularly for intelligent document processing use cases like invoice extraction. While not explicitly stated for Sixt's implementation, this capability would be relevant for processing customer-submitted documents.
## Evaluation and Quality Assurance
Evaluation was emphasized as a critical success factor, particularly for customer-facing generative AI. Dominic from Sixt stated: "especially with Chennai [GenAI], you don't see it just in one number. You have to think about the evaluation strategy to ensure that the quality is at a sufficient level."
The presentation didn't provide extensive detail on Sixt's specific evaluation methodology, but the emphasis on "robust evaluation" as a lesson learned, combined with the decision to improve classification accuracy from 70% to 90% before proceeding with expanded automation, indicates that Sixt established clear quality thresholds that needed to be met before deploying more automated responses.
The human-in-the-loop approach for reply generation, where agents review AI-generated responses before sending, provides an additional quality layer while also generating data that could be used for ongoing evaluation and improvement. Only for cases where Sixt has "sufficient confidence" in quality does the system send fully automated responses.
## Cost Management
Cost optimization was presented as a critical consideration often overlooked during proof-of-concept phases but crucial for production deployment. The presentation noted that for organizations not building custom models, most costs come from inference (running models in production) rather than training.
Amazon Bedrock offers three licensing options: on-demand for small workloads or POCs with no commitment; provisioned throughput for large production workloads requiring guaranteed capacity; and batch inference for non-real-time workloads offering up to 50% cost savings versus on-demand.
The 70% reduction in classification costs Sixt achieved while improving accuracy demonstrates that model selection and architecture decisions significantly impact production economics. The presentation suggested that features like model distillation, prompt caching, and intelligent prompt routing can further optimize price-performance in production.
## Trust, Security, and Compliance
The presentation positioned trust as "non-negotiable" for production GenAI systems, addressing questions that Sixt and other enterprises face from compliance and security departments: data protection including PII, training data usage policies, interaction isolation and security, consistent security controls, and protection from harmful content.
Amazon Bedrock Guardrails was highlighted as providing security controls without coding, capable of blocking harmful content by up to 85% and reducing hallucinated answers by 75%. The presentation cited the European Parliament's selection of AWS and Anthropic Claude models on Bedrock for their Archibbot application serving 2 million archived documents as validation of the security and compliance capabilities.
While the presentation didn't detail Sixt's specific security implementation, operating across 100+ countries with varying data protection regulations would necessarily require robust data governance, and the selection of a hyperscaler cloud platform with comprehensive compliance certifications supports those requirements.
## Business Value Framework
The broader presentation provided valuable framing on measuring business value from AI investments, noting that many organizations struggle with measurement because they focus exclusively on financial ROI. The AWS presenters proposed three dimensions of business value:
**Financial Value**: Direct return on investment from enhancing existing business processes. For Sixt, this includes the 70% cost reduction in classification operations and productivity gains from automation.
**Employee Value**: Improvements in employee wellbeing, efficiency, and satisfaction that may not translate directly to euro figures but are essential for talent retention. For Sixt, this includes unburdening agents from routine tasks and upskilling them in generative AI applications.
**Future Business Value**: Strategic positioning and competitive advantage that involves higher risk and longer time horizons for realizing returns. Sixt's deployment of customer-facing chatbots across multiple countries represents this type of strategic investment.
The presenters noted that different priorities across these dimensions lead to different portfolio outcomes and that organizations should consciously decide their AI investment strategy based on factors like risk appetite, available skills and resources, and time horizon for returns.
## Lessons Learned and Success Factors
Sixt identified several key lessons from their deployment:
**Business Process Insight**: Understanding the volume and distribution of different inquiry types was crucial for prioritization. Knowing what percentage of requests related to each use case allowed Sixt to start with the most impactful cases first, such as reservation modifications.
**Stakeholder Buy-in**: Implementing generative AI in customer service represents organizational transformation beyond just a technology project, requiring change management and buy-in from all involved stakeholders.
**System Integration**: Seamless integration with backend systems like the reservation platform was essential for enabling end-to-end automation. Sixt credited their platform team for facilitating these integrations.
**Market Alignment**: Continuously evaluating where to focus development capacity relative to evolving capabilities from AWS and partners to avoid building capabilities that vendors will soon provide.
The broader presentation emphasized additional success factors: model choice matters with hybrid model strategies; data provides business differentiation; managing both training and inference costs is essential; and trust and security are non-negotiable.
## Critical Assessment
While this case study demonstrates successful production deployment of LLMs for customer service, several aspects warrant balanced consideration:
**Vendor-Specific Context**: This presentation was delivered at an AWS event by AWS representatives and a customer using AWS services extensively. While the technical achievements and metrics presented appear credible, the case study naturally emphasizes AWS capabilities without exploring alternative approaches or potential limitations.
**Limited Performance Metrics**: While the 90% classification accuracy and 70% cost reduction are specific and impressive, the presentation provided limited detail on other key metrics such as customer satisfaction scores, actual response time improvements, percentage of inquiries fully automated versus requiring human intervention, or error rates in automated responses.
**Human-in-the-Loop Dependency**: The system still requires human review for many responses, with full automation only occurring when there is "sufficient confidence" in quality. The presentation didn't quantify what percentage of responses are fully automated versus requiring human review, which is crucial for understanding the actual productivity gains.
**Phased Deployment Complexity**: The timeline shows that Sixt deliberately deployed first on their legacy platform before migrating to Salesforce, and had to pause automation expansion to improve classification. While this pragmatic approach demonstrates good judgment, it also suggests that the path to production was more complex than a simple five-month timeline might imply.
**Evaluation Methodology**: While robust evaluation was emphasized as critical, specific details about evaluation metrics, testing procedures, or ongoing monitoring approaches were not provided. For a customer-facing application operating in 100+ countries, comprehensive evaluation frameworks would be essential.
**Scalability and Expansion**: The presentation indicated that voice automation is on the roadmap but not yet implemented. Many customer service interactions occur via phone, so the current solution addresses only a portion of the total customer service workload.
Despite these considerations, Sixt's implementation represents a substantive production deployment of LLMs with measurable business impact. The combination of improved accuracy, reduced costs, and expansion across multiple channels and countries demonstrates a maturing LLMOps practice. The willingness to share lessons learned about needing to step back and improve classification before proceeding, and the emphasis on organizational factors like stakeholder buy-in alongside technical factors, provides valuable insights for other organizations undertaking similar journeys.
The case study is particularly valuable for demonstrating how organizations can take a phased approach to GenAI deployment, starting with augmentation (agent support with suggested responses) before moving to full automation, and deliberately managing the pace of change to allow both technical and organizational learning.