ZenML

Scaling Knowledge Management with LLM-powered Chatbot in Manufacturing

OSRAM 2025
View original source

OSRAM, a century-old lighting technology company, faced challenges with preserving institutional knowledge amid workforce transitions and accessing scattered technical documentation across their manufacturing operations. They partnered with Adastra to implement an AI-powered chatbot solution using Amazon Bedrock and Claude, incorporating RAG and hybrid search approaches. The solution achieved over 85% accuracy in its initial deployment, with expectations to exceed 90%, successfully helping workers access critical operational information more efficiently across different departments.

Industry

Tech

Technologies

Overview

OSRAM, the well-known lighting company, presented a case study at an AWS event alongside their implementation partner Adastra. OSRAM operates a manufacturing plant in Germany that was originally built in 1961—a classic “brownfield” industrial facility that represents typical legacy German manufacturing. The company has undergone a dramatic transformation as the lighting industry shifted from traditional bulbs to LED technology, with production volumes dropping from 4,000 tons to just 4 tons annually. This shift fundamentally changed their business model from mass production to project-based work with shorter product lifecycles.

The presentation featured multiple speakers from OSRAM (plant manager Ingo and R&D representative Brent) as well as a technical consultant from Adastra (Johannes), providing perspectives from both the business problem side and the solution implementation side.

Business Problem and Motivation

The core challenge OSRAM faced was knowledge preservation and accessibility in a rapidly changing manufacturing environment. Several factors drove the need for a GenAI solution:

The aging workforce presented a critical risk—experienced workers leaving the company take valuable institutional knowledge with them. In a high-precision manufacturing environment where efficiency depends heavily on accumulated expertise, this knowledge drain was unsustainable. The presentation candidly acknowledged that they “cannot afford” to lose this knowledge.

Data and documentation were scattered across the organization in various legacy systems and file formats. The plant’s long history meant dealing with old tools and file formats, including information trapped in images and documents that were difficult to access electronically. Finding the right information took significant time, impacting operational efficiency.

The workforce diversity added complexity to the solution requirements. The plant employs operators, maintenance personnel, engineers, and scientists—each with different information needs, varying educational backgrounds, and different levels of technical sophistication. Any solution needed to accommodate this diversity.

The company had already embarked on a digitalization journey as part of their broader strategy combining lean management principles (pokayoke, color coding) with digital transformation. This chatbot initiative was positioned as part of that larger Industry 4.0 effort, for which OSRAM had already received external recognition (an Industry 4.0 award).

Technical Architecture and Implementation

The solution was built on AWS infrastructure using Amazon Bedrock as the foundation for LLM capabilities. The implementation partner Adastra brought their expertise as an AWS Partner of the Year (Data Analytics and Innovation in EMEA) to design and deploy the system.

Data Ingestion and Pre-processing

The first step in their five-step approach involved collecting documents from various sources and storing them in an Amazon S3 bucket. This included PDFs, PowerPoint presentations, and other document formats containing procedural information, error resolution steps, and operational guidance. Pre-processing was necessary to extract and structure information from these diverse sources.

Hybrid Search Strategy with OpenSearch

A particularly interesting technical decision was the implementation of a hybrid search approach using Amazon OpenSearch. Rather than relying solely on vector similarity search (the typical RAG approach), they combined semantic vector search with traditional keyword search. This hybrid approach was specifically chosen to handle the abundance of technical terminology in manufacturing documentation. Technical terms can be difficult to capture accurately with embeddings alone, especially in specialized industrial contexts where precise terminology matters. The keyword search component helps ensure that specific technical terms are found accurately even when semantic similarity might miss them.

Foundation Model Selection

The team selected Claude from Anthropic as their foundation model, accessed through Amazon Bedrock. The presentation noted that Bedrock’s flexibility allows them to switch models as needed—they mentioned Amazon Nova as a potential future option. This model-agnostic architecture is a sensible LLMOps practice that avoids vendor lock-in at the model layer.

RAG Implementation

The retrieval-augmented generation approach stores document embeddings in the OpenSearch vector database, retrieves relevant context based on user queries, and provides that context to Claude for generating answers. The system includes conversation history to provide contextual continuity across multi-turn interactions.

Hallucination Mitigation

The presentation explicitly addressed the critical issue of hallucinations in an operational manufacturing context. If a chatbot provides incorrect instructions, operators could damage expensive machinery or create safety hazards. Their approach involves the system recognizing when information cannot be directly found in the source documents. Rather than making up answers, the chatbot acknowledges the limitation and attempts to provide relevant guidance from previous conversation context or related documented procedures. This is a crucial safety measure for industrial applications where incorrect information has real consequences.

User Interface

The user-facing interface was built using Streamlit, providing a simple chat interface accessible to shop floor workers. The emphasis was on simplicity—workers shouldn’t need to navigate complex tools or switch between multiple systems. The chatbot serves as a single point of access to consolidated organizational knowledge.

Feedback Mechanism

The system incorporates a thumbs up/thumbs down feedback mechanism that allows users to rate the quality of responses. This feedback is used to iteratively improve the solution over time. This is a standard but essential LLMOps practice for continuous improvement of production AI systems.

Deployment and MLOps/LLMOps Practices

The presentation emphasized that this was not just a proof of concept but a production-ready solution with proper operational foundations:

CI/CD Pipeline

A robust CI/CD pipeline was implemented from the start, recognizing that scalability was a core requirement. The team acknowledged that “it’s not just about a proof of concept” and stressed getting value quickly while maintaining the ability to scale.

Infrastructure as Code

Terraform was used for infrastructure as code, enabling consistent and reproducible deployments. This was specifically mentioned as essential for future scaling to other departments and other OSRAM plants.

Testing

The presentation mentioned having tests in place as part of their deployment pipeline, though specific testing strategies were not detailed.

Performance Metrics and Current Status

At the time of the presentation, the solution had been deployed with an accuracy of over 85%. The team expressed confidence in reaching “well above 90%” in the coming weeks through continued iteration based on user feedback. While these accuracy figures sound promising, it’s worth noting that the specific methodology for measuring accuracy wasn’t detailed—this could refer to retrieval accuracy, answer quality, or some other metric.

Change Management and User Adoption

An often-overlooked aspect of LLMOps that OSRAM addressed was user adoption and training. They implemented what they called “artificial intelligence consultation hours”—sessions held directly on the shop floor with workers to gather feedback and ensure the solution meets actual user needs. This iterative feedback approach from real users is credited with helping improve the solution.

The presentation also emphasized the importance of training users to understand what the solution can and cannot do, ensuring that guardrails are respected and value is created where appropriate. This represents a mature understanding that deploying an LLM solution is as much about organizational change as it is about technology.

OSRAM highlighted their workforce’s existing “future skills” and adaptability, with a stated cultural credo of “never too old to learn”—suggesting that user adoption was facilitated by an existing culture of continuous improvement.

Future Roadmap

The stated next steps include:

Critical Assessment

This case study presents a solid example of applying RAG-based LLM technology to a real industrial knowledge management problem. Several aspects deserve recognition:

The hybrid search approach combining vector search with keyword search is a practical solution to a real limitation of pure semantic search, especially in technical domains. The emphasis on CI/CD and infrastructure as code from day one reflects mature engineering practices. The explicit attention to hallucination risks and the implementation of guardrails shows appropriate concern for safety in an industrial context.

However, some caution is warranted. The presentation was delivered by OSRAM and their implementation partner Adastra at an AWS event, creating obvious incentive to present the project favorably. The accuracy metrics (85%+, targeting 90%+) are stated without methodology, making them difficult to evaluate. The solution appears to still be relatively early in its deployment, so long-term operational success remains to be proven. The presentation also doesn’t address costs, maintenance requirements, or challenges encountered during implementation.

Overall, this represents a credible example of LLM technology being deployed in a production manufacturing environment with appropriate attention to operational concerns, though the full success of the initiative will only be evident over time as it scales across the organization.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building a Multi-Agent Research System for Complex Information Tasks

Anthropic 2025

Anthropic developed a production multi-agent system for their Claude Research feature that uses multiple specialized AI agents working in parallel to conduct complex research tasks across web and enterprise sources. The system employs an orchestrator-worker architecture where a lead agent coordinates and delegates to specialized subagents that operate simultaneously, achieving 90.2% performance improvement over single-agent systems on internal evaluations. The implementation required sophisticated prompt engineering, robust evaluation frameworks, and careful production engineering to handle the stateful, non-deterministic nature of multi-agent interactions at scale.

question_answering document_processing data_analysis +48

AI-Powered Vehicle Information Platform for Dealership Sales Support

Toyota 2025

Toyota Motor North America (TMNA) and Toyota Connected built a generative AI platform to help dealership sales staff and customers access accurate vehicle information in real-time. The problem was that customers often arrived at dealerships highly informed from internet research, while sales staff lacked quick access to detailed vehicle specifications, trim options, and pricing. The solution evolved from a custom RAG-based system (v1) using Amazon Bedrock, SageMaker, and OpenSearch to retrieve information from official Toyota data sources, to a planned agentic platform (v2) using Amazon Bedrock AgentCore with Strands agents and MCP servers. The v1 system achieved over 7,000 interactions per month across Toyota's dealer network, with citation-backed responses and legal compliance built in, while v2 aims to enable more dynamic actions like checking local vehicle availability.

customer_support chatbot question_answering +47