ZenML

Evolution of Industrial AI: From Traditional ML to Multi-Agent Systems

Hitachi 2024
View original source

Hitachi's journey in implementing AI across industrial applications showcases the evolution from traditional machine learning to advanced generative AI solutions. The case study highlights how they transformed from focused applications in maintenance, repair, and operations to a more comprehensive approach integrating LLMs, focusing particularly on reliability, small data scenarios, and domain expertise. Key implementations include repair recommendation systems for fleet management and fault tree extraction from manuals, demonstrating the practical challenges and solutions in industrial AI deployment.

Industry

Tech

Technologies

Overview

This case study is derived from a conference presentation by Dr. Chetan Gupta, General Manager for AI Research at Hitachi, delivered at an industrial AI conference. The presentation provides a comprehensive view of Hitachi’s journey in deploying AI systems across diverse industrial domains including energy (transmission and distribution networks), rail, semiconductors, and digital services. As a major industrial conglomerate, Hitachi operates in sectors where reliability is paramount—as the speaker notes, “if Hitachi makes a mistake, someone dies.” This context fundamentally shapes their approach to LLMOps and generative AI deployment.

The presentation spans three years of evolution in Hitachi’s industrial AI strategy, with the speaker having presented variations of this roadmap since 2022. The content is particularly valuable because it addresses the real-world challenges of deploying LLMs in industrial settings rather than the typical consumer-focused applications that dominate AI discussions.

The Industrial AI Context and Challenges

Before diving into generative AI applications, the presentation establishes critical context about what makes industrial AI different from typical Silicon Valley applications. These factors directly impact how LLMOps must be approached in industrial settings:

Reliability Requirements: Unlike consumer applications where occasional failures are tolerable, industrial AI must achieve the reliability of physical components. The speaker uses the example of car motors—a luxury car has 40-50 motors that users never notice because they always work as intended. This sets the bar for AI reliability in industrial contexts.

Small Data Constraints: Industrial environments often lack the trillions of data points available for training consumer-focused models. Solutions must work with limited, domain-specific datasets while maintaining high accuracy.

Heterogeneous Data Types: Industrial settings involve sensor data, event data, maintenance records, operational data, acoustic data, vision data, and multimodal documents—far more diverse than the text-centric data that drove initial LLM development.

System of Systems Complexity: Industrial products like automobiles involve multiple tiers of suppliers, with components ranging from tier-four suppliers up through final assembly. AI systems must model these complex interdependencies.

Deep Domain Knowledge: Successful industrial AI requires incorporating domain expertise that may not be captured in training data. This is particularly challenging for LLMs trained on general-purpose corpora.

The Penske Truck Repair Case Study: Traditional ML Augmented with Generative AI

The most concrete production deployment discussed is the truck repair recommendation system at Penske, described as the country’s largest fleet company. This system has been operational for 5-7 years, deployed across 700 shops, and has served more than half a million trucks.

Original Traditional ML Approach: The initial solution used ensemble classification techniques to predict the correct repair based on truck data. When a truck arrives at a shop and a technician connects a diagnostic dongle to download fault codes, the ML model recommends the appropriate repair. This replaced the manual process of technicians consulting OEM-provided diagnostic trees.

The training data consisted of historical repairs: input features included driver complaints and fault codes, with the target being the correct repair action that technicians had historically performed. This was modeled as a traditional classification problem, though the speaker notes the final model was “fairly complex” as an ensemble of multiple classification techniques.

Generative AI Augmentation: The traditional ML approach had a limitation: when encountering new truck makes/models or when onboarding new vendors, there wasn’t enough historical data for accurate predictions. Generative AI is now being used to augment the system by incorporating service manuals.

The key insight here is that generative AI isn’t replacing the existing solution but augmenting it—handling cases where the historical data approach falls short. This hybrid architecture represents a common pattern for industrial LLMOps: using generative AI to extend proven traditional ML systems rather than building from scratch.

Graph RAG for Fault Tree Extraction and Querying

A particularly interesting technical contribution is the use of generative AI for extracting and querying fault trees from maintenance manuals. The speaker demonstrated a system that:

The system provides grounded answers, meaning responses are linked to specific locations in the source manual where users can verify the information. This grounding is critical for industrial applications where domain experts need confidence in AI recommendations before operationalizing them.

The speaker emphasizes that industrial systems require thinking about “systems of systems”—the extracted fault trees become complex with components, subcomponents, and their associated faults and actions. The graph structure naturally captures these hierarchical relationships.

Domain-Specific Small Language Models

The presentation includes important discussion of the trade-offs around language model size for industrial applications. When generative AI hype peaked and management asked about Hitachi’s large language model strategy, the research team pushed back: “We don’t know how to build very large language models in the first place and we don’t even know how useful they would be.”

The conclusion was that domain-specific small language models are more appropriate for industrial applications. However, the speaker notes that “domain-specific” itself is ambiguous—for the power industry, should there be one model for the entire industry, or separate models for power generation, transmission, and distribution, with further subdivision within those categories?

This represents a key LLMOps decision point for industrial applications: choosing the appropriate level of domain specialization and model size based on the specific use case requirements.

OT Software Code Generation

An emerging use case discussed is using generative AI for OT (Operational Technology) software code completion, specifically for industrial applications like CNC machine programming. The speaker notes several unique challenges:

This represents an area where standard LLM deployment approaches are insufficient, and industrial-specific LLMOps practices are needed.

Process Transformation as the Unit of AI Value

A recurring theme is that AI delivers value through step-by-step process transformation, not wholesale replacement. The speaker quotes Bill Gates: “the effect of generative AI is overestimated in the short term and underestimated in the long term.” This is because each process step must be transformed individually, and gains accumulate gradually.

For LLMOps practitioners, this has important implications: deployment success should be measured in terms of specific process improvements rather than attempting to transform entire workflows at once. The Penske example illustrates this—AI replaced one step (the diagnostic tree lookup) in a multi-step repair process, while the rest of the workflow remained unchanged.

The Industrial AI Roadmap: 1.0, 2.0, and 3.0

The presentation outlines an evolution of industrial AI capabilities:

Industrial AI 1.0 (circa 2020): Focused on production, installation, and maintenance using traditional machine learning. Canonical problems included predictive maintenance, quality optimization, and supply chain management.

Industrial AI 2.0 (current): Generative AI enables coverage of the full industrial value chain, including upstream processes (materials discovery, design, RFP response) and downstream processes (customer support). The key is marrying generative AI with traditional industrial AI based on the data types involved—workforce communication (high text/NLP), process data (mixed), and asset data (primarily sensor/event data with some manuals).

Industrial AI 3.0 (future vision): Agentic architectures with both virtual and physical agents. The speaker describes scenarios where field workers interact with virtual agents for guidance and with robots for physical assistance. This is particularly relevant in Japan where the working-age population is declining—workers may need to handle multiple domains, augmented by AI agents.

Research Priorities for Reliable Industrial AI

For AI to be valuable in industrial settings, the speaker identifies the need for systems that are reliable, cost-effective, and can perform valuable tasks—and specifically, Mission-critical tasks at low cost with high complexity. Current research priorities include:

The speaker recommends three papers for understanding the future direction: DeepMind’s learned global weather forecasting (time series at scale), AlphaGeometry (neurosymbolic computation combining LLMs with symbolic reasoning for verifiable results), and Bruno Buchberger’s work on automated programming and symbolic computation.

Temporal and Causal Reasoning Limitations

In the Q&A, the speaker addresses a critical limitation of current LLM-based approaches for industrial data: handling temporal information and causality. When applying transformer-based architectures to time series data, they treat readings as sequences but ignore the actual temporal relationships, losing valuable information.

Hitachi is working on Functional Neural Networks (FNNs) that represent time series in terms of basis functions, similar to how CNNs transformed image processing. The speaker argues that foundational models for time series will need to incorporate temporal nature, not just sequential nature, and notes that causality remains “a very open problem.”

Key Takeaways for LLMOps Practitioners

The presentation offers several insights for deploying LLMs in industrial settings: hybrid architectures that augment proven traditional ML with generative AI capabilities; graph-based knowledge representation for complex industrial knowledge; grounding and explainability as requirements for domain expert adoption; domain-specific fine-tuning for specialized applications like OT code generation; and the recognition that process transformation happens incrementally rather than all at once. The speaker’s team includes not just ML practitioners but optimization specialists, simulation experts, and mathematicians—reflecting the breadth of techniques needed for industrial AI success.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Enterprise-Scale AI-First Translation Platform with Agentic Workflows

Smartling 2025

Smartling operates an enterprise-scale AI-first agentic translation delivery platform serving major corporations like Disney and IBM. The company addresses challenges around automation, centralization, compliance, brand consistency, and handling diverse content types across global markets. Their solution employs multi-step agentic workflows where different model functions validate each other's outputs, combining neural machine translation with large language models, RAG for accessing validated linguistic assets, sophisticated prompting, and automated post-editing for hyper-localization. The platform demonstrates measurable improvements in throughput (from 2,000 to 6,000-7,000 words per day), cost reduction (4-10x cheaper than human translation), and quality approaching 70% human parity for certain language pairs and content types, while maintaining enterprise requirements for repeatability, compliance, and brand voice consistency.

translation content_moderation multi_modality +44

Enterprise AI Platform Integration for Secure Production Deployment

Rubrik 2025

Predibase, a fine-tuning and model serving platform, announced its acquisition by Rubrik, a data security and governance company, with the goal of combining Predibase's generative AI capabilities with Rubrik's secure data infrastructure. The integration aims to address the critical challenge that over 50% of AI pilots never reach production due to issues with security, model quality, latency, and cost. By combining Predibase's post-training and inference capabilities with Rubrik's data security posture management, the merged platform seeks to provide an end-to-end solution that enables enterprises to deploy generative AI applications securely and efficiently at scale.

customer_support content_moderation chatbot +53