Commonwealth Bank of Australia: Agentic AI for Cloud Migration and Application Modernization at Scale

Overview

Commonwealth Bank of Australia (CBA), in collaboration with AWS ProServe, embarked on one of the financial services industry’s most ambitious modernization programs to migrate legacy applications from end-of-support Windows 2012 environments to cloud-native architectures. The initiative represents a comprehensive LLMOps implementation where multiple AI agents work together to automate and accelerate the traditionally slow, manual, and expertise-intensive process of enterprise application modernization. The speakers—Dina Alan Triana Saandham (Head of Modernization, AWS ProServe ANZ) and Ash Mullin (GM Cloud Acceleration at CBA and acting CTO of CBA India)—presented this case study at AWS re:Invent 2025, demonstrating how they built a production-grade multi-agent system that increased modernization velocity by 2-3x while maintaining quality, security, and compliance standards.

Business Problem and Context

CBA faced a significant modernization challenge that reflects broader industry patterns. According to market research from Gartner, McKinsey, and ISG cited in the presentation, approximately 70% of enterprise workloads remain on-premises with code written over 20 years ago, and average transformation times range from 1-2 years. CBA specifically needed to migrate applications running on Windows Server 2012 (approaching end-of-support) to cloud environments, but wanted to avoid simple lift-and-shift approaches in favor of true modernization that would deliver cloud-native benefits.

The challenges were multifaceted. When CBA initially assessed their migration pipeline, they found the process extremely time-consuming and labor-intensive. Documentation archaeology became a major bottleneck—applications built 10-15 years ago had fragmented knowledge, with original developers often having left the organization or moved to different roles. Multiple projects over the years had added features without comprehensive documentation updates, creating significant knowledge gaps. Technical debt was substantial, with outdated packages, missing binaries in artifact systems, and frameworks requiring upgrades before migration could even begin. At their baseline, before implementing AI solutions, CBA could only modernize approximately 10 applications per year—far too slow for their scale requirements.

The organization had already established a solid foundation with their internal DevOps Hosting Platform (DHP), which provided deployment automation, continuous delivery, evergreen environments, and immutable infrastructure. However, even with this platform engineering capability in place, the upstream work of understanding, analyzing, documenting, and transforming legacy applications remained a critical bottleneck preventing the organization from achieving migration velocity at scale.

Solution Architecture: The Lumos Platform

CBA developed “Lumos” (a Harry Potter reference meaning to “shine light” on legacy codebases), an internal multi-agent AI platform that orchestrates the complete modernization lifecycle. The platform is architecturally designed around extensibility and reusability, establishing patterns that any engineer building new accelerators can follow.

The frontend is built with Next.js and hosted in containers on Amazon ECS on Fargate. This UI layer makes calls to an orchestrator agent running in an agent runtime environment. Behind this orchestrator sits a multi-agent workflow system that interacts with various specialized accelerators built for specific modernization tasks—code analysis, cybersecurity document generation, high-level solution architecture generation, network analysis, and more.

The technical stack leverages multiple AWS services in production. Amazon Bedrock serves as the foundational model layer, providing access to various LLMs. The system uses OpenSearch Serverless as the vector store for both short-term and long-term memory persistence. AWS Knowledge Bases implements RAG (Retrieval-Augmented Generation) with knowledge stored in S3 buckets. Containers are stored in Amazon Elastic Container Registry (ECR). The team used Pydantic and CrewAI as their agentic AI orchestration frameworks, though they also referenced LangChain and LlamaIndex as framework options they considered.

A critical architectural decision was the integration of deterministic engines alongside AI agents. Static code analyzers, sequence diagram generators, class diagram generators, and other deterministic tools provide “facts” that the AI agents then apply intelligence to. This hybrid approach significantly reduces hallucination risks and increases the reliability and auditability of changes. The presentation emphasized that agents are augmenting human experts rather than replacing them—the agents handle heavy lifting while domain experts provide validation and oversight through human-in-the-loop processes.

The platform implements the Model Context Protocol (MCP), connecting to existing MCP servers within CBA to ensure generated solutions align with enterprise-specific requirements. This allows agents to call internal APIs for compliance rules, formatting standards, and architectural patterns that the organization has standardized on over the years.

Multi-Agent System Design and Patterns

The implementation showcases sophisticated multi-agent design patterns in production. The presentation categorized these patterns into several types that Lumos implements:

Basic Reasoning Agents operate purely on context without external tools or complex memory. For example, agents can process policy documents or licensing documents to determine upgrade paths and compliance requirements based solely on the provided context.

Tool-Based Agents bridge the gap between thinking and doing. These agents don’t just produce text—they decide when to call specific APIs, Lambda functions, or database queries to retrieve information. In Lumos, code modernization agents call compliance APIs maintained by governance bodies and formatting APIs to ensure generated code aligns with enterprise standards, enabling automatic approval by risk and cyber teams.

Memory-Augmented Agents implement both short-term and long-term memory patterns. Short-term memory helps agents understand the context of previous comments and peer review feedback to minimize future review cycles when generating new code. Long-term memory enables learning across sessions—for instance, when applications were modernized months ago following APRA standards (Australian Prudential Regulation Authority), that learning informs current modernization efforts to ensure best practices are consistently applied.

Multi-Agent Workflow Orchestration manages complex multi-step tasks. The modernization workflow agent orchestrates between specialized agents: a discovery agent that finds repositories and refactors code, a testing agent that validates the refactored code, and a compliance documentation agent that generates reports on changes made and testing outputs.

A particularly sophisticated example is the solution document generation workflow. An orchestrator agent coordinates between a content writer agent and a content reviewer agent. The reviewer scores the writer’s output and provides feedback for up to three iterations. In the demo, the first iteration scored only 30%, with specific feedback on improvements needed. The second iteration was also rejected, and only the third iteration achieved acceptable quality. This multi-agent review process ensures documentation quality without requiring immediate human intervention at every step.

Analyze and Design Phase

The first major capability in Lumos focuses on application understanding and solution design. The process begins with an interview with the application owner, captured as a transcript. Engineers upload this transcript to Lumos along with the application’s Configuration Item (CI) identifier from their Configuration Management Database (CMDB).

The AI agents process this transcript to extract structured information. The system breaks down the application into user stories formatted in Gherkin syntax (Given-When-Then), which can flow directly into engineering backlogs. The agents identify critical gaps early—for example, determining whether the application has a CI/CD pipeline or noting if it’s a legacy application without automation, which would require pipeline creation before migration.

An interesting example from production: a business owner stated they wanted to decommission an application with only 20 users remaining, while the migration engineer was simultaneously asking about connectivity and deployment details. The AI correctly captured both intents from the conflicting conversation, and the resulting requirements documentation revealed the disconnect, allowing CBA’s head of engineering to catch that they were about to migrate an application the business wanted to shut down. This demonstrates how the AI’s structured extraction can prevent costly mistakes.

For network analysis, Lumos connects directly to CBA’s VMware environment via the NSX API, pulling connectivity information automatically. It generates comprehensive documentation of infrastructure, application ownership (pulled from CMDB), and all integration points with ingress and egress flows, port numbers, and IP addresses. For visual thinkers, it also generates network flow diagrams showing connections between application nodes. All this information is stored in the system’s long-term memory for use by downstream agents.

The code analysis accelerator provides deep repository inspection. After selecting a repository, engineers can trigger an analysis that examines the primary tech stack, generates a detailed description of what the application does (far more useful than typical CMDB descriptions), breaks down functionality and features into modules, identifies network connectivity patterns, catalogs API endpoints, documents the tech stack (Spring Boot, client-server frameworks, database libraries), and performs cloud readiness assessments.

Cloud readiness analysis examines dependencies on network storage (which would need migration to S3), integration dependencies on message queues and other systems, critical issues requiring fixes before migration, security and compliance concerns, suggestions for future enhancements (like Java version upgrades), repository structure analysis highlighting files with issues, and configuration file documentation. The system generates both class diagrams (showing component relationships) and sequence diagrams (showing interaction flows), providing comprehensive visual documentation of the application architecture.

Solution and Cybersecurity Documentation Generation

Leveraging all the information gathered during analysis—transcripts, network data, code analysis, class diagrams, sequence diagrams—Lumos generates comprehensive solution documents and cybersecurity posture assessments. Engineers can anchor these documents to the Application CI and provide additional context by uploading documents from knowledge systems like Confluence, current architecture diagrams, network analyzer reports, or any other relevant materials.

The generated solution document includes application overview, component details, tags, server details, reference materials, target architecture, target environment, assumptions and risks, integration flows, and DNS/service account information. The entire process is fully agentic with human-in-the-loop validation rather than human-driven generation.

The cybersecurity document takes a similar approach but from a security perspective, assessing dimensions like service details, solution overview, scope, information assets, threat modeling tables, node diagrams and tables, flow tables, and security zone diagrams that map to CBA’s security zone model. It identifies risks, documents required policy exemptions for migration, and ensures compliance with organizational non-negotiables. The security documentation is critical for CBA given the regulatory environment and customer trust placed in the bank to operate safely.

Both document types leverage inline feedback capabilities—engineers can provide specific feedback on sections, and agents dynamically update the content based on that feedback. Final documents are exported as Markdown and committed directly into the code repository, ensuring documentation lives alongside code for easy reference during migration.

Transform Phase: Code Modernization with Hybrid Approaches

The transformation phase showcases a sophisticated hybrid approach combining deterministic tools with AI agents. The system creates the concept of an “application” that can encompass multiple repositories (though the demo showed a single-repo example). For standard three-tier applications—web tier, app tier, database tier—engineers can bring multiple repositories together under one modernization effort.

The transformation agent makes multiple attempts to modernize application code. First, it establishes a baseline of the application in its current state. Then it triggers OpenRewrite, a deterministic code refactoring tool, to attempt conversion (for example, from older Java versions to newer ones). In the demo, this first attempt failed with poor quality results. The orchestrator agent, recognizing the failure, brought in a second tool—Amazon Q Developer (now called Codewhisperer CLI)—to work on the code after OpenRewrite’s initial pass. Through this iterative process with multiple tools and multiple attempts, the system eventually achieved successful transformation.

The execution log shows the detailed process: instantiate a container with one-time setup, pull the codebase into the container, download build tooling, compile the code, run available unit tests to establish baseline behavior, then iterate with OpenRewrite and Q Developer until achieving a successful build. This demonstrates the LLMOps principle of combining deterministic and AI-based tools in a production pipeline with clear validation criteria.

Importantly, the system doesn’t just check for successful builds—it also scores confidence across multiple dimensions: number of lines of code changed, number of files modified, number of libraries updated, and other metrics. In one production case, a build succeeded but received a low confidence score because the AI agent changed too many libraries unnecessarily. The orchestrator agent rejected this, instructing the transformation agent to try again with fewer changes. This confidence scoring and multi-pass refinement is a critical LLMOps pattern for ensuring production quality.

For SQL Server 2012 database migrations (also approaching end-of-support), Lumos includes a specialized accelerator. Engineers upload SQL Server Integration Services (SSIS) packages, and the system analyzes package structure using Bedrock. It identifies compatibility issues for SQL 2019 upgrades, required updates, deprecated features, security considerations, and remediation steps. The system can even automatically apply fixes. More strategically, CBA is now targeting cloud-native solutions, so the accelerator also performs AWS Glue assessments using the AWS Schema Conversion Tool to generate transformation plans for moving from SSIS to Glue-based workflows.

Testing and Validation

A major challenge CBA faced was the lack of automated tests in legacy applications. Applications built 10-15 years ago rarely had unit tests and almost never had automated UI tests. To address this, Lumos includes a UI test generator using AI-powered approaches.

Engineers provide an internal URL (or public website for demonstration purposes) and simple instructions like “Load the homepage and navigate to a new page.” The system uses AI agents to analyze the website and generate Selenium scripts automatically. If the website being tested corresponds to code that was previously analyzed, the system imports the code breakdown and features to build context about what the application should do, enabling more intelligent test generation.

The demo showed successful test generation, execution, and evidence capture. The AI agent successfully loaded a homepage and navigated to a banking link without explicit instructions mentioning banking—it inferred the appropriate test path from the application context. The system captures screenshots throughout test execution to provide test evidence for auditing and validation purposes.

For internal applications, CBA uses both Selenium agents and Amazon Bedrock’s computer use capabilities to simulate user interactions and generate test evidence. This testing automation is essential for validating that modernized applications behave correctly compared to their legacy versions, especially given the lack of existing test suites.

Deployment and Operations with DHP Integration

Lumos integrates tightly with CBA’s existing DevOps Hosting Platform (DHP) for deployment. DHP simplifies infrastructure-as-code by having engineers provide just enough parameters to achieve outcomes rather than writing hundreds of lines of Terraform or CloudFormation templates.

The deployment interface allows engineers to select organization domain, environment (dev, test, prod), and application. The system reads metadata and presents visual configuration for parameters including environment settings, CBA security zone model tiers (web tier, app tier, internal control zone, database tier), Application Load Balancer provisioning, tags, ECS clusters, Lambda functions, and container configurations.

For the Lumos MCP server deployment shown in the demo, engineers configure minimal parameters like image name and deployment targets. Built-in validation enforces cybersecurity rules—for example, preventing internet-facing ECS configurations that would violate security policies. When an engineer pushes the deployment button, the system generates a pull request, triggers GitHub Action workflows, and provisions infrastructure while deploying the application to cloud automatically.

DHP provides several critical capabilities for production LLM operations: deployment automation ensures consistency across environments; continuous delivery enables rapid iteration; evergreen environments automatically keep systems updated; immutable infrastructure prevents configuration drift; and DHP agents installed on all VMs detect manual changes, automatically destroying and recreating “tainted” machines to maintain pristine, fully immutable environments in production.

This immutability is particularly important for LLMOps—when AI agents are generating and deploying code, having strong guarantees about environment consistency and the ability to quickly roll back or recreate infrastructure becomes essential for maintaining reliability and security.

Evaluation, Confidence Scoring, and Human-in-the-Loop

A critical LLMOps aspect of Lumos is the comprehensive evaluation and confidence scoring framework. The system doesn’t just generate artifacts and deploy them—it continuously assesses quality and confidence before allowing progression.

For code transformation, confidence scores consider multiple dimensions: lines of code changed, number of files modified, number of libraries updated, and the scope of changes relative to the modernization goal. Low confidence scores trigger agent review and regeneration even if the build technically succeeds. This prevents AI agents from making unnecessarily broad changes that increase risk.

For documentation generation, the multi-agent review process provides built-in evaluation. The content reviewer agent scores outputs numerically and provides specific feedback. Only after iterative improvement and acceptable scoring does the document move forward. This creates a self-improving loop where agents learn from reviewer feedback within a single generation session.

Human-in-the-loop validation remains mandatory for critical decisions. After code transformation, pull requests are generated for human review before merging. The system presents confidence scores alongside changes to help reviewers focus on higher-risk modifications. Engineers can provide inline feedback on generated documentation, which agents incorporate dynamically to refine outputs.

The platform also implements continuous compliance agents and evaluation agents specifically to ensure reliability and avoid hallucinations. By combining deterministic engine outputs with AI-generated content, the system ensures that documentation and code changes are 90-100% accurate according to CBA’s validation criteria. This hybrid approach of deterministic facts plus AI intelligence is presented as a key strategy for production LLMOps reliability.

Production Results and Business Impact

The quantitative results demonstrate significant LLMOps value. CBA’s modernization velocity increased from a baseline of approximately 10 applications per year to 20-30 applications per quarter—representing roughly a 2-3x improvement in throughput. Over the course of their modernization program, they assessed over 370 applications for cloud migration using the Lumos platform.

Beyond raw velocity, the quality and consistency of modernization improved. Engineers reported that having comprehensive, AI-generated documentation that actually reflects what applications do (rather than generic CMDB descriptions) dramatically reduces the time spent in the understanding phase. Network diagrams, code analysis, and dependency mapping that previously required weeks of manual effort now generate in minutes, allowing engineers to focus on higher-value decision-making and validation.

The platform’s extensibility means that as CBA builds new accelerators for additional modernization patterns, they follow established architectural patterns, reducing development time for new capabilities. Engineers building accelerators report having a familiar paradigm that scales across different use cases.

From a business perspective, CBA emphasizes that modernization velocity directly supports their goal of building solutions that are “better, safer, and faster” for customers. Running legacy applications with vulnerabilities in production poses risks to customer trust. The Australian community places significant trust in CBA to operate safely, and the ability to modernize at scale while maintaining security and compliance standards is essential to preserving that trust.

Technical Challenges and Lessons Learned

The presentation candidly discussed several technical challenges encountered in production. Repository analysis broke context windows repeatedly, requiring architectural changes. The team addressed this by breaking large repositories into smaller chunks using AWS Step Functions that trigger Lambda functions to process segments in parallel, then reassembling results at the end.

Early testing revealed that while the system worked well on CBA’s internal applications, external users from Bankwest (another financial institution) immediately encountered failures when testing on their repositories. This drove the need for more robust error handling and broader testing across different application patterns.

The challenge of cohesive end-to-end solutions versus individual accelerators emerged in user feedback. When presenting to application owners, conversations consistently gravitated toward code transformation alone, missing the broader value of the complete modernization workflow. This insight led to the development of “modernization pathways” concept—opinionated, orchestrated workflows that guide users through the complete journey for specific technology stacks rather than presenting individual tools.

Engineers also learned that simply getting code transformation working isn’t enough—the continuous value of keeping documentation updated as applications evolve remains unsolved. Future work focuses on having agents continuously monitor repositories for new commits and automatically update documentation, maintaining accuracy over the application lifecycle without manual intervention.

The challenge of cross-repository analysis represents another frontier. Many enterprise applications span multiple repositories with complex dependencies. Building dependency maps between repos and understanding upstream and downstream impacts of changes in a multi-repo context requires additional agent capabilities that the team is actively developing.

Future Roadmap and Advanced Agentic Patterns

CBA outlined several next-phase capabilities for Lumos that represent advanced agentic AI patterns. The goal is moving from human-triggered modernization to agent-initiated modernization. Agents would continuously scan codebases, automatically start modernization processes, and then prompt humans when they need additional information—inverting the current interaction model from human-directed to agent-directed with human augmentation.

Cross-repository analysis and dependency mapping will enable agents to understand if touching a function in one repository impacts upstream and downstream systems across multiple repos. This capability is essential for safely modernizing complex enterprise applications with distributed architectures.

Self-improving agents represent another frontier—building systems where agents learn from their own successes and failures over time, continuously improving code transformation quality, documentation accuracy, and test coverage without explicit retraining.

Language expansion is also planned. While the current system handles .NET, Java, Node.js, and JavaScript well, expanding to support iOS, Android, and 20+ other languages requires additional agent training and pattern development.

The “modernization pathways” concept represents a significant evolution in the user experience. Rather than presenting individual accelerators, Lumos will offer guided journeys tailored to specific technology stacks and target architectures. For example, a .NET Framework to .NET Core pathway would assemble all relevant accelerators—business requirement analysis from meeting notes, specific transformation patterns for framework upgrades, Windows-to-Linux containerization, hosting configuration, testing, cyber documentation, network flow validation—into a cohesive wizard that walks engineers through each step with appropriate automation at each stage.

The demo showed an early version of this pathway interface for CBA’s Netbank application migration to DHP, with different technology stacks (.NET, older .NET Core versions, .NET Framework) each requiring different transformation approaches and containerization strategies (Windows containers vs Linux containers). This pathway approach promises to make the full power of the platform more accessible to engineers who may not be experts in all aspects of modernization.

LLMOps Architecture Considerations and Best Practices

The case study demonstrates several important LLMOps best practices for production deployments. The emphasis on extensible, reusable architectural patterns ensures that teams can build new AI capabilities without reinventing infrastructure each time. Having a standard pattern where UI components call orchestrator agents that coordinate specialized agents, with consistent memory storage and tool integration, dramatically reduces the cognitive load for developers building new accelerators.

The hybrid approach of deterministic engines plus AI agents addresses the reliability and hallucination concerns that often prevent AI adoption in regulated industries. By having deterministic tools generate “facts” that AI agents then enhance with intelligence, the system achieves both accuracy and flexibility. This pattern is particularly important in financial services where auditability and correctness are non-negotiable.

The multi-agent review and scoring systems build quality gates directly into the generation process. Rather than relying solely on human review after generation, having agents review other agents’ work with scoring and feedback loops creates a self-improving system that produces higher quality outputs before human review, making human validation more efficient and focused on true edge cases or judgment calls.

The Model Context Protocol (MCP) integration demonstrates the importance of connecting AI agents to enterprise-specific context. Generic LLMs don’t understand CBA’s compliance requirements, API standards, or architectural patterns. By integrating MCP servers that provide this enterprise context, generated code and documentation automatically align with organizational standards, reducing review cycles and increasing adoption.

Memory architecture with both short-term and long-term storage enables agents to learn within sessions and across sessions. Short-term memory allows agents to incorporate feedback immediately within a single modernization effort, while long-term memory enables organizational learning where best practices discovered months ago automatically inform current work.

The human-in-the-loop patterns preserve accountability and safety. While agents automate heavy lifting, humans review critical decisions like code merges and deployment approvals. Confidence scoring helps humans focus their review time on higher-risk changes, making the review process more efficient without eliminating the human judgment that remains essential in production systems.

Conclusion

Commonwealth Bank of Australia’s Lumos platform represents a sophisticated, production-grade implementation of multi-agent AI systems for enterprise application modernization. By combining multiple specialized agents with deterministic tooling, memory systems, evaluation frameworks, and human-in-the-loop validation, CBA achieved a 2-3x improvement in modernization velocity while maintaining the security, compliance, and quality standards required in regulated financial services. The system demonstrates mature LLMOps practices including hybrid AI-deterministic architectures, multi-agent orchestration patterns, comprehensive evaluation and confidence scoring, extensible platform design, and integration with existing DevOps tooling. The case study provides valuable insights for organizations looking to deploy AI agents at scale for complex, multi-step workflows in production environments where reliability and auditability are paramount.

Agentic AI for Cloud Migration and Application Modernization at Scale

Industry

Technologies