## Overview
This case study presents two distinct enterprise stories shared at AWS re:Invent 2025, focusing on how large organizations are using AI and LLMs in production with their SAP environments. The primary focus is on Harman International's innovative use of generative AI to solve a critical documentation challenge during their S/4HANA migration, while Axfood's story provides context on the broader evolution from traditional machine learning to generative AI adoption in retail operations.
## Harman International: The Custom Code Documentation Challenge
Harman International is a global audio technology company (owned by Samsung Electronics since 2017) with well-known brands including JBL, Harman Kardon, and AKG. The company operates across automotive solutions, consumer audio, and professional audio segments. Their SAP landscape is substantial, with two SAP ECC 6 EHP 7 instances containing approximately 30,000 and 20,000 custom objects respectively, along with extensive supporting systems including SAP BW, IBP, Ariba, BTP, and various manufacturing execution systems.
### The Business Problem
Harman embarked on an S/4HANA migration with multiple strategic objectives: improving business process efficiency, executing finance transformation (consolidating company codes, harmonizing chart of accounts), reducing technical debt, minimizing database footprint, and streamlining custom code. The company selected a selective data transition approach, migrating only open transactions, master data, and customizing data, while using SAP DMLT services for data migration with a shell approach.
The critical challenge emerged around custom code rationalization. After more than 25 years of SAP ECC operation, Harman had accumulated 30,000+ custom objects with extremely poor or minimal documentation. Understanding these customizations was essential for several reasons: eliminating unused code (initial analysis showed 40% was not in use), analyzing interface dependencies, identifying where S/4HANA simplifications could replace custom code, and limiting testing scope to what was actually business-critical.
Without proper documentation, the business faced significant risks: operational inefficiencies, escalating costs, missed innovation opportunities, integration risks from unclear dependencies, and inability to optimize system performance or reduce future upgrade costs. The documentation was also needed for mapping custom objects to business processes (L1-L4 like order-to-cash, record-to-report) to support unit testing, system acceptance testing (SAT), and user acceptance testing (UAT).
### The Failed Manual Approach
Harman initially attempted manual documentation, starting with 6 consultants. The performance was poor and extremely time-consuming. To meet the initial 6-month target timeline, they ramped up to 12 consultants. Even with doubled resources, the estimated timeline stretched to 15 months. The cost of 12 consultants over 15 months was prohibitively expensive. Additionally, the quality was inconsistent—each consultant documented in different formats, and the collaboration required with functional teams to understand 25-year-old code produced outputs that lacked uniformity and didn't provide the expected value. The manual outputs were highly technical and difficult for business or functional stakeholders to understand without significant additional effort.
### The Generative AI Solution
Facing this impasse, Harman explored AWS AI capabilities, specifically starting with AWS Bedrock. The initial results were not satisfactory—the tool produced only technical details like tables used by programs without sufficient business context. However, through collaboration with the AWS team and iterative prompt engineering, they significantly improved the outputs to meet their requirements.
The final solution utilized AWS Bedrock (and later Amazon Q Developer with the latest Claude models) to automatically scan all 30,000 custom objects. The key to success was developing the right prompts that would generate structured, consistent documentation understandable by multiple audiences.
### Technical Implementation Details
While the transcript doesn't provide extensive technical architecture details, several important aspects emerge about the LLMOps implementation:
**Model Selection and Evolution**: The solution leveraged Anthropic Claude models through AWS Bedrock, and the presenter specifically mentioned that Amazon Q Developer with the latest Claude versions provided even better outputs. This aligns with the broader conference context mentioned by Eric Kammon about rapid LLM maturity in 2025, particularly noting Claude Sonnet's progression through versions 3.5, 3.7, 4.0, and 4.5, with each release bringing major improvements in analyzing SAP business context and ABAP code.
**Prompt Engineering**: The success of the solution hinged critically on prompt engineering. The initial outputs from Bedrock were too technically focused, providing only table references and technical details. Through iterative refinement with AWS support, Harman developed prompts that generated three-tier documentation: a high-level business summary (4-5 lines explaining purpose in business language), a functional description detailing key functionalities for SAP functional team members, and detailed technical step-by-step documentation for ABAP developers to understand each code snippet. This multi-layered approach ensured the documentation was valuable across different stakeholder groups.
**Consistency Through Standardization**: A critical success factor was using the same prompt across all 30,000 custom objects, ensuring uniform, structured outputs. This standardization was impossible to achieve with manual processes involving multiple consultants with varying documentation styles.
**Production Deployment Context**: While not explicitly detailed as a traditional "production deployment" with monitoring dashboards and API endpoints, this represents a significant production use case of LLMs—processing tens of thousands of business-critical code objects to support a multi-year, multi-million dollar enterprise transformation. The outputs directly feed into subsequent migration phases including code remediation, testing, and production cutover planning.
### Results and Impact
The generative AI approach delivered dramatic improvements across multiple dimensions:
- **Speed**: 6-7x faster than manual processing
- **Timeline**: Reduced from 15 months to 2 months (87% reduction)
- **Cost**: Over 70% reduction in total costs
- **Quality**: Highly structured, consistent outputs easily interpretable by business, functional, and technical stakeholders
- **Business Value**: Enabled downstream activities including business process mapping, testing scope definition, identification of replacement opportunities with S/4HANA standard functionalities, and informed decision-making for code remediation strategies
The presenter provided a concrete before-and-after example showing manual documentation that was dense, technical, and difficult to parse versus AI-generated documentation with clear sections for business summary, functional description, and technical details—all from the same ABAP program.
### Future Use Cases and LLMOps Evolution
Varda Reddy outlined several planned expansions of generative AI use in Harman's ongoing S/4HANA journey, demonstrating a maturing LLMOps practice:
**Unit Testing Automation**: After code remediation (using a tool called SmartShift for automatic ABAP remediation plus functional code changes for finance transformation), Harman plans to use Amazon Q's Model Context Protocol (MCP) capabilities to automate unit testing in development environments. This would quickly scan remediated code to verify it's consistent, efficient, and S/4HANA-compatible, reducing highly manual testing efforts.
**New Code Documentation Standards**: All new custom code developed during S/4HANA implementation will be documented using Bedrock or Q Developer with predefined prompts. This will be mandatory for all developers to ensure consistency and completeness, addressing the common problem of developers inadequately documenting their work. This represents an LLMOps best practice of embedding AI into development workflows from the start.
**Agentic Testing with Historical Data**: Perhaps most ambitiously, Harman is exploring using AWS agentic capabilities to scan historical transactions in the ECC environment, capture actual data used in different transactions, and use this real-world data to automate testing in the S/4HANA environment. Manual test data generation is extremely labor-intensive, and automating this through AI agents analyzing production transaction patterns could significantly accelerate testing phases.
These planned use cases demonstrate a thoughtful progression from initial problem-solving (documentation) to proactive integration of LLMs throughout the development and testing lifecycle—a hallmark of mature LLMOps practices.
### Critical Success Factors and Lessons Learned
Varda emphasized a key lesson: "Whenever you are dealing with some task which is highly manual and highly repetitive and consuming a lot of your resources, think AI as the first option." This represents an important mindset shift from trying manual approaches first to evaluating AI automation early in problem identification.
The collaboration with AWS was repeatedly highlighted as critical to success, particularly in improving prompts and outputs beyond initial unsatisfactory results. This underscores an important LLMOps reality: successful LLM implementations often require iteration, experimentation, and partnership with platform providers or experts rather than expecting immediate perfect results.
## Axfood: The Evolution from ML to GenAI
While Harman's story represents a focused, production-critical GenAI deployment, Axfood's presentation provides valuable context on how enterprises are evolving from traditional ML to generative AI adoption.
### Company and SAP Landscape Context
Axfood is a leading Swedish grocery retailer with approximately 800 stores serving millions of customers weekly, representing about 25% of Sweden's food retail market. As a high-volume, low-margin business, standardization and operational efficiency are paramount. Their SAP landscape is extensive: S/4HANA with the retail industry solution add-on, SAP CAR (Customer Activity Repository) for merchandising and promotion planning, SAP EWM and FNR for forecasting, plus SAP Commerce Cloud and SuccessFactors. The S/4HANA database exceeds 6TB running on over 4,000 CPUs across 140 virtual machines. They maintain this complex on-premise environment entirely with internal staff without third-party involvement, with over 15 years of custom ABAP development.
### Traditional Machine Learning in Production
Axfood has been operating AI in production for years, with over 100 machine learning models currently deployed. Their ML platform, called "Mimmer," is built on AWS and centers on Amazon SageMaker and Apache Airflow. The platform supports the full ML lifecycle: data pipeline creation (from manual files, third-party sources, data warehouse, or web scraping), data exploration in SageMaker Studio, AutoML for optimal model selection, model training and evaluation, and deployment of artifacts. Results are returned as ML models via APIs or as result sets back into source systems.
Axfood claims approximately 30% better accuracy in custom-built models compared to standard SAP system capabilities, particularly for handling irregularities and deviations. This advantage stems from incorporating more historical data and broader data sources than SAP systems alone provide. The use cases span multiple business areas: campaign forecasting, seasonal forecasting, sales forecasting, e-commerce forecasting, warehouse optimization (placing items to minimize picker routes), e-commerce personalization (product recommendations, related items, next-best-action prediction, displaying commonly purchased items), assortment planning simulations (price optimization, volume prediction, item substitution effects), customer clustering and personalized offers, and data sharing with suppliers for improved supply chain planning.
A particularly interesting operational detail: Axfood's typical e-commerce shopping basket contains about 50 items, making any UX simplification highly valuable. ML-powered recommendations and "most commonly purchased items" features directly reduce friction in the customer journey.
### Generative AI Experiments and Production Use
Axfood's generative AI journey represents early-stage but meaningful production deployment. They built a design tool using a fine-tuned Stable Diffusion model on AWS Bedrock, specifically trained on their brand design language. This tool was given to business users who experimented with it and actually created package designs for milk cartons that were sold in stores. While this might seem like a limited use case, it served critical strategic purposes: demonstrating tangible AI capabilities to leadership, building awareness outside IT departments, and securing executive buy-in for broader AI initiatives.
Beyond the design tool, Axfood is using generative AI for employee productivity tools such as chatbots and similar applications, though details weren't elaborated. Gustav Hilding also mentioned they're currently using Amazon Q Developer for development across all teams (both SAP and non-SAP) and actively exploring agentic AI, particularly looking at MCP servers to make their data "AI ready."
### Data Architecture and LLMOps Infrastructure
Axfood's data architecture evolution provides important context for their AI/LLMOps capabilities. Their cloud journey began in 2014 with AWS adoption for e-commerce (needing elastic scaling for variable purchase patterns), expanded to an AI platform on AWS, and culminated in moving their entire data platform to AWS in 2022 to resolve scaling issues from on-premise data warehousing.
Their current data ecosystem follows a layered architecture: an application layer (primarily SAP systems) generating data, an integration layer using AWS EventBridge, SAP Event Mesh, and Cloud Platform Integration (CPI) for message handling, a BI stack using DBT and AWS Glue for ETL and data layers, an ML layer with Mimmer for traditional ML and an "AI gateway" for generative AI applications, and an analytics layer with MicroStrategy for data warehouse analysis and SAP Analytics Cloud for direct SAP system access. All data is exported from SAP and non-SAP sources into their data stack on S3, where it becomes available for AI development.
Axfood's future roadmap focuses on better SAP data integration, more AI models with real-time capabilities and new use cases, building data products with proper governance and semantic layers, supporting agentic AI better through MCP servers and related technologies, and rearchitecting legacy systems like SAP BW.
### Strategic Evolution and LLMOps Maturity
Gustav emphasized that treating data as a foundational business improvement asset since 2012 created a long-term focus on building multiple data layers with a single source of truth. Introducing high-value use cases early secured leadership buy-in, which made funding and team creation for ML development significantly easier. This represents an important LLMOps lesson: demonstrating concrete business value early, even if with limited scope, can unlock resources for broader adoption.
The transition from traditional ML to generative AI at Axfood illustrates a common enterprise pattern: extensive investment in classical ML infrastructure and production models, followed by selective exploration of generative AI for use cases where LLMs offer distinct advantages (creative generation, natural language interaction, code assistance). Axfood hasn't abandoned their 100+ traditional ML models—they're complementing them with GenAI where appropriate.
## Broader Context: Enterprise LLMOps Trends in 2025
Eric Kammon's opening remarks provide important industry context. He noted that 2025 represents a "very pivotal year" where adoption shifted from proof-of-concepts and evaluations at the beginning of the year to mainstream production deployments by year-end. This transition was enabled by rapid LLM maturity (particularly Anthropic Claude's evolution), industry adoption of open standards (like Model Context Protocol for agent-to-data-source communication), and AWS service maturity.
The AWS services highlighted for SAP GenAI use cases include Amazon Q Developer (AI coding assistant working well with SAP languages including ABAP, CAP, RAP, with the ABAP Accelerator MCP server for unit test generation and ECC-to-S/4 code conversion), Amazon Q Business and related services (Q Automate, QuickSight) for AI-powered business intelligence integrating multiple data sources including SAP with agentic interaction capabilities, Amazon Quiro (new agentic IDE for building BTP applications, Fiori, and UI5 front-ends), and Amazon Bedrock and Bedrock Agent Core (LLM integration platform and agent runtime environment for scalable, secure agent deployment). The emphasis on MCP servers, agentic capabilities, and integrated developer experiences reflects the maturation of LLMOps tooling beyond basic API calls to LLMs.
## Critical Assessment and Balanced Perspective
While both case studies present compelling success stories, several considerations warrant attention:
**Claims vs. Evidence**: Harman's quantified results (6-7x speed improvement, 70% cost reduction, 15-month to 2-month timeline reduction) appear credible given the nature of the task and the capabilities of modern LLMs for code analysis. The before-and-after documentation example provided tangible evidence. However, the presentation doesn't address potential risks such as accuracy validation processes, handling of edge cases where AI documentation might be incorrect or misleading, or the effort required for human review and correction of AI outputs.
**Axfood's ML Claims**: The claim of "30% better accuracy" for custom ML models versus SAP standard capabilities is presented without detailed context about baseline accuracy levels, specific metrics used, or independent validation. While plausible (custom models with more data often outperform generic solutions), these claims should be viewed as self-reported success metrics.
**Generalizability**: Both companies have substantial resources (Harman with 30,000+ custom objects clearly has significant SAP investment; Axfood runs 6TB+ databases with 4,000 CPUs). The applicability of these approaches to smaller organizations with different resource profiles may vary. The emphasis on AWS-specific services also means organizations using other cloud providers would need to adapt the approaches.
**Maturity Stages**: Harman is still early in their S/4HANA journey—the documentation phase precedes actual migration, code remediation, and testing. The ultimate success of their GenAI strategy won't be fully validated until after go-live. Their planned future use cases (unit testing automation, agentic testing) are still aspirational rather than production-proven.
**Technical Depth**: The presentation, being a conference talk, lacks deep technical details about prompt engineering specifics, model configuration, token costs, latency considerations, error handling, or human-in-the-loop review processes—all critical LLMOps operational concerns. The claim that "Amazon Q Developer with latest Claude" provided better results than initial Bedrock outputs suggests ongoing experimentation and optimization rather than a settled solution.
**Vendor Context**: This is an AWS re:Invent presentation featuring AWS customers discussing AWS services. While the problems and solutions appear genuine, there's inherent selection bias toward successful AWS use cases. The repeated thanks to AWS teams suggest close partnership arrangements that may not be available to all customers.
## LLMOps Lessons and Best Practices
Despite these considerations, several valuable LLMOps insights emerge:
**Prompt Engineering as Critical Success Factor**: The transformation of Bedrock outputs from inadequate technical summaries to multi-layered, stakeholder-appropriate documentation through prompt refinement underscores that prompt engineering remains a core LLMOps competency. Organizations should expect initial outputs to require iteration and be prepared to invest in prompt optimization.
**Standardization Enables Scale**: Using consistent prompts across 30,000 objects enabled both speed and quality that manual processes couldn't match. This principle applies broadly—standardized LLM interaction patterns enable automation at scale while maintaining quality.
**Multi-Audience Output Design**: Harman's three-tier documentation (business summary, functional description, technical details) represents thoughtful UX design for LLM outputs. Effective LLMOps considers who will consume AI-generated content and structures outputs accordingly.
**AI-First Mindset for Repetitive Tasks**: The lesson to evaluate AI automation early for manual, repetitive tasks represents an important cultural shift in enterprise LLMOps adoption—moving from "try traditional approaches first" to "evaluate AI first, fall back to manual if necessary."
**Progressive Use Case Expansion**: Both companies demonstrate thoughtful progression from initial use cases to expanded applications. Harman is moving from documentation to testing to development standards; Axfood progressed from traditional ML to GenAI experiments to broader productivity tools. This staged approach allows learning and capability building while delivering incremental value.
**Model Evolution Management**: The acknowledgment that Claude versions improved significantly throughout 2025, with each release bringing "major step changes," highlights an LLMOps operational reality: foundation model capabilities are rapidly evolving, and organizations must have processes to evaluate and adopt model improvements while maintaining production stability.
**Infrastructure Integration**: Axfood's layered data architecture with clear integration points, standardized data pipelines, and separation of concerns between traditional ML and GenAI demonstrates mature MLOps/LLMOps infrastructure design. The emphasis on MCP servers for agentic AI readiness shows forward-looking architectural thinking.
## Conclusion
These case studies illustrate the practical reality of enterprise LLMOps in 2025: moving beyond pilots to production deployments solving real business problems with measurable impact. Harman's documentation automation addresses a concrete pain point in a high-stakes migration project with clear ROI, while Axfood demonstrates the evolution from traditional ML operations to incorporating generative AI where it adds distinct value. Both stories emphasize iteration, collaboration, standardization, and strategic thinking about where LLMs provide advantages over traditional approaches—core tenets of successful LLMOps practices in enterprise environments.