## Overview
Mercedes-Benz undertook a massive digital transformation initiative to migrate their Global Ordering system from mainframe to AWS cloud infrastructure. This system, internally called "GO," represents the lifeline of Mercedes-Benz sales operations, processing every vehicle order and production request globally. The application serves over 8,000 users across 150 countries, handles 5.1 billion messages per year, and manages 450 batch processes. With approximately 5 million lines of code in Java and COBOL combined, plus another 5 million lines in other languages, and more than 20,000 interfaces throughout the company, this represents approximately 50% of Mercedes-Benz's entire mainframe workload.
The project, which began over 2 years ago, employed a strategic combination of traditional replatforming and AI-powered code transformation. Christian Kleme from Mercedes-Benz IT and Manuel Breitfeld from Capgemini presented this case study, highlighting how they leveraged generative AI and agentic systems to accelerate specific refactoring efforts while maintaining system stability and minimizing disruption to downstream consumers.
## Strategic Approach and Partnership Model
The migration strategy was deliberately phased and multi-faceted. Mercedes-Benz chose Capgemini as their general contractor to orchestrate the complex partnership ecosystem, which included AWS as the cloud platform provider and Rocket Software as the provider of Enterprise Server—a mainframe emulation solution that runs on AWS. This partnership structure was critical for managing the interdependencies across various technical domains and ensuring coordinated delivery.
The team adopted a staged migration approach rather than attempting a complete refactoring of the entire application. They began with stateless services that could run in parallel on both the legacy mainframe and the new cloud platform, allowing for extensive comparative testing and validation before committing to the new infrastructure. This risk mitigation strategy was essential given the mission-critical nature of the application, where any disruption could halt vehicle ordering and production across the entire global organization.
The business case showed significant cost savings compared to maintaining the mainframe infrastructure, and this migration became a cornerstone of Mercedes-Benz's broader mainframe exit program targeting complete departure from the platform within the next several years. The project gained strong top management involvement, which proved crucial for securing resources and maintaining momentum across the multi-year initiative.
## AI-Powered Code Transformation: The GenRevive Implementation
The most innovative aspect of this migration involved using agentic AI for code transformation, specifically through a tool called GenRevive. While the overall strategy called for replatforming (essentially rehosting the application on cloud infrastructure with minimal changes), the team identified specific components where refactoring would deliver additional business value, particularly where Mercedes-Benz was already modernizing related sales operations.
GenRevive implements a multi-agent architecture that mimics a software engineering team. The system assigns different AI agents to specific roles: software engineer, software reviewer, tester, and DevOps engineer. Each agent operates within its designated domain, and a human orchestrator coordinates their activities. This division of labor reflects modern software development practices and allows each agent to specialize in its particular function rather than attempting to handle all aspects of transformation simultaneously.
The human-AI collaboration model was carefully designed. Human experts remained responsible for critical upstream activities: analyzing the application to identify suitable refactoring candidates, providing relevant documentation (including existing test cases and design documents), and creating what the team called "cookbooks"—examples showing the AI how to properly transform specific code patterns from COBOL to Java. These cookbooks served as training examples, giving the AI models concrete patterns to follow and helping establish quality standards for the transformation output.
This approach recognizes a fundamental principle in production LLM systems: AI excels at pattern recognition and repetitive transformation tasks, but human expertise remains essential for strategic decisions, context provision, and quality validation. The team didn't simply throw legacy code at an AI system and hope for good results; they invested significant effort in preparing the AI with proper context, examples, and guidance.
## The Pricing Service Case Study
The centerpiece demonstration of AI-powered transformation was the pricing service, a component consisting of 1.3 million lines of COBOL code. This service was architecturally split between a Java portion running on IBM WebSphere and a COBOL portion running on IBM CICS (Customer Information Control System), both accessing the same mainframe database with the Java layer calling into COBOL components.
The challenge was twofold: the service couldn't handle the increased call volume required by Mercedes-Benz's modernization efforts, and maintaining it on the mainframe was becoming increasingly expensive and constraining. Rather than simply scaling up mainframe infrastructure, the team decided to attempt AI-powered transformation to consolidate everything into a unified Java application running on AWS.
The timeline was remarkably compressed. Beginning in February with the decision to try the GenAI approach, the team achieved their first commit to a GitHub repository in March and had a deployable version by May. This represents an extraordinary acceleration compared to traditional manual code transformation or rewriting efforts, which for 1.3 million lines of code would typically require years of developer effort.
However, the raw transformation was not sufficient for production deployment. The team discovered that code generated by the AI, while functionally correct, required performance tuning—particularly around database access patterns. On the mainframe, where everything runs in memory with extremely fast database access, certain coding patterns work efficiently. Those same patterns proved suboptimal when moved to AWS with separate database services. The team manually optimized these database access patterns to maximize performance in the new environment, demonstrating that even with AI-powered transformation, human expertise in performance engineering remains critical.
When the team reviewed the final code quality, they found it indistinguishable from human-written code in terms of maintainability and structure. This is a crucial finding for production LLM systems: the generated code needed to be maintainable by regular development teams who would support it going forward, not just functionally correct for initial deployment.
## The Global Ordering Facade: Enabling Safe Migration
A critical architectural component enabling this migration was the Global Ordering Facade, implemented using a standard product from Woolsoft. This integration layer serves as a gateway that can intelligently route incoming requests from web clients to either the legacy mainframe backend or the new cloud backend.
From an LLMOps perspective, this facade provided several essential capabilities that made AI-generated code viable in production:
**Parallel Testing and Validation**: During the QA and UAT phases, the facade allowed the team to route identical requests to both the mainframe and cloud systems simultaneously. Since both systems were accessing synchronized data (via Precisely's data streaming tool maintaining 500 megabytes of database records in real-time sync), they should produce identical results. The facade automatically captured and compared responses, dramatically reducing manual testing effort and enabling validation with real production traffic patterns rather than synthetic test cases.
This approach addresses a fundamental challenge in deploying AI-generated code: establishing confidence that the transformation is correct. By running both systems in parallel with real traffic and automatically comparing results, the team could empirically validate the AI's work at scale rather than relying solely on unit tests or limited integration testing.
**Controlled Rollout**: The facade enabled gradual traffic shifting from mainframe to cloud. Rather than a big-bang cutover, the team could incrementally increase the percentage of requests routed to the new system while monitoring performance and correctness. This de-risked the deployment considerably, allowing quick rollback if issues emerged without requiring downstream systems to make any changes.
**Minimal Consumer Impact**: From the perspective of the 20,000+ interfaces that interact with Global Ordering, the migration was nearly transparent. Consuming systems only needed to update the URL they called; all other integration details remained unchanged. This dramatically reduced coordination overhead and prevented the migration from becoming entangled with changes across hundreds of dependent systems.
**Non-Functional Requirements Monitoring**: The facade provided visibility into latency, throughput, and SLA compliance for both systems. During the parallel running period, this allowed direct performance comparison. The data showed that the new Java service running on AWS actually outperformed the mainframe version, with better handling of traffic spikes—a common advantage of cloud infrastructure's elasticity over fixed mainframe capacity.
## AI Tools in the Assessment and Development Phases
Beyond the GenRevive agentic transformation system, the team employed AI across multiple phases of the migration lifecycle:
**Assessment Phase**: Tools like Brad, CAP 360, and CAST Insights (some incorporating AI capabilities) helped analyze the existing application structure, map interfaces, and identify business rules embedded in the code. For a system of this scale and age (Capgemini had been maintaining it for over 25 years), automated discovery was essential for comprehensive understanding. These tools helped create the inventory and documentation that later informed the transformation process.
**Coding Assistance**: All developers on the project had access to AI coding assistants, following the pattern mentioned in AWS keynotes. While the presentation didn't detail specific tools, this reflects the increasingly standard practice of augmenting developer productivity with AI pair programming capabilities. This is particularly valuable when working with generated code, as developers can more quickly understand, modify, and extend the AI-transformed codebase.
**Testing Support**: AI-assisted tools helped identify whether tests succeeded or failed, potentially including automated analysis of failure patterns and suggested fixes. This complements the facade's automated comparison testing by helping developers quickly diagnose and address issues during development and testing phases.
This multi-tool approach recognizes that LLMOps in practice involves orchestrating various AI capabilities across the development lifecycle rather than relying on a single monolithic solution.
## Platform Architecture and Operational Model
The target platform, called Helios internally at Mercedes-Benz, combines Rocket's Enterprise Server (for mainframe emulation) with standard AWS services. The Enterprise Server component is fully managed by AWS, meaning Mercedes-Benz purchases it as a managed service rather than operating the emulation layer themselves. This reduces operational complexity and allows the team to focus on application-level concerns.
Beyond the mainframe emulation layer, the platform includes a Java stack, batch processing services, database services (crucial given the application's data-intensive nature), and messaging infrastructure to support the 5.1 billion messages per year. Standard cloud services for monitoring, cost management, and storage round out the platform.
The Precisely data streaming tool plays a critical role in the architecture by maintaining real-time synchronization of mainframe data to cloud databases. For the pricing service specifically, this meant 500 megabytes of database records continuously synced, enabling the new Java service to access current data without requiring immediate migration of the entire data layer. This phased approach to data migration reduces risk and complexity.
The team is still in the early stages of the "operate and optimize" phase, learning how to run and tune the application in its new cloud environment. The presentation mentioned that they're evaluating which additional components might benefit from AI-powered refactoring similar to the pricing service, suggesting this approach will expand to other parts of the application where refactoring delivers clear value beyond simple replatforming.
## Production Deployment and Results
The pricing service went live in September 2025 with zero incidents—a remarkable achievement for a migration of this scale and complexity, particularly for AI-generated code handling critical production traffic. The parallel running period provided high confidence before cutover, and performance monitoring showed the new system consistently outperformed the legacy mainframe version.
Key benefits realized include:
**Reduced Mainframe Costs**: By moving the pricing service workload off the mainframe, Mercedes-Benz reduced their mainframe resource consumption and associated costs. Given that Global Ordering represented 50% of their mainframe workload, each component successfully migrated contributes meaningfully to the business case.
**Improved Performance**: The new Java service running on AWS demonstrated better response times and superior handling of traffic spikes compared to the mainframe implementation. This enables the downstream modernization efforts that originally drove the need for increased capacity.
**Accelerated Timeline**: The AI-powered transformation achieved in months what would traditionally require years of manual effort. From problem identification in early February to production deployment in September represents approximately seven months for a 1.3 million line transformation—an order of magnitude faster than traditional approaches.
**Maintainable Codebase**: The transformed code quality proved indistinguishable from human-written code, ensuring that ongoing maintenance and enhancement won't be constrained by AI-generated artifacts. This addresses a common concern about AI code generation: that it might produce "write-only" code that's difficult to maintain.
## Critical Assessment and LLMOps Lessons
While the presentation naturally emphasizes successes, several important nuances emerge when examining this case through an LLMOps lens:
**Selective Application of AI**: The team didn't attempt to AI-transform the entire 5-million-line application. They strategically identified components where refactoring delivered clear value (the pricing service's performance constraints) and where the technical characteristics (a relatively isolated service with clear interfaces) made transformation tractable. This selective approach reflects mature LLMOps practice: using AI where it provides clear advantage rather than applying it universally.
**Essential Human Expertise**: Success required extensive human involvement at multiple stages. Creating cookbooks to guide the transformation, identifying which components to refactor, performing manual performance optimization, and orchestrating the agentic AI system all demanded deep domain and technical expertise. The AI accelerated and automated aspects of the work but didn't eliminate the need for skilled practitioners.
**Comprehensive Testing Infrastructure**: The facade-based parallel testing approach was arguably as critical to success as the AI transformation itself. Without the ability to automatically validate AI-generated code against the proven mainframe implementation using real production traffic, establishing confidence for deployment would have been far more difficult and time-consuming. Organizations attempting similar AI-powered migrations should invest heavily in validation infrastructure.
**Performance Tuning Required**: The AI-generated code required manual optimization, particularly around database access patterns that differed between mainframe and cloud environments. This highlights a limitation of current code transformation AI: it excels at syntactic and structural transformation but may not automatically optimize for the performance characteristics of the target environment. Teams should plan for a performance tuning phase after initial transformation.
**Vendor Ecosystem Complexity**: Success depended on coordinating multiple vendors (Capgemini, AWS, Rocket Software, Woolsoft, Precisely) each providing critical components. While Capgemini served as general contractor, managing these dependencies added organizational complexity. The value of AI-powered transformation must be weighed against this coordination overhead.
**Risk Mitigation Through Staging**: Starting with stateless services that could run in parallel rather than attempting to migrate the entire application at once proved essential for managing risk. This staged approach allowed learning and refinement before tackling more complex components. Organizations should resist pressure to accelerate timelines by skipping these validation stages, especially when deploying AI-generated code to mission-critical systems.
**Data Synchronization Overhead**: Maintaining real-time sync of 500 megabytes of data between mainframe and cloud adds complexity and cost. While this enabled the phased migration approach, it represents transitional overhead that will only be eliminated when the complete migration finishes. Teams should account for these transitional costs in their business cases.
The case study represents a significant success in applying agentic AI to legacy modernization, but the success factors extend well beyond the AI technology itself. The careful planning, strategic selectivity, comprehensive testing infrastructure, and willingness to invest human expertise in guiding and validating the AI's work all contributed essentially to the outcome. This provides a valuable template for other organizations considering AI-powered approaches to similar challenges, while also highlighting the continued importance of traditional software engineering discipline even when leveraging advanced AI capabilities.