Samsung is implementing a comprehensive LLMOps system for autonomous semiconductor fabrication, using multi-modal LLMs and reinforcement learning to transform manufacturing processes. The system combines sensor data analysis, knowledge graphs, and LLMs to automate equipment control, defect detection, and process optimization. Early results show significant improvements in areas like RF matching efficiency and anomaly detection, though challenges remain in real-time processing and time series prediction accuracy.
This case study presents Samsung’s ambitious effort to achieve a “fully autonomous fab” (fabrication facility) for semiconductor manufacturing. The speaker, a Samsung engineer, clarifies that this vision is not about replacing engineers but rather empowering them—drawing an analogy to autonomous driving where humans remain important but are freed from manual, repetitive tasks. The core framework for autonomous fab operations consists of three components: sensing (observing equipment behavior), analysis (understanding what’s happening), and control (taking action based on insights).
The semiconductor industry presents unique challenges for AI and LLM deployment. Modern chip fabrication, particularly at advanced nodes like 5nm, involves close to 1,000 process steps. Achieving the 80%+ yield required for profitability at these nodes (5nm, 4nm, 3nm) has become increasingly difficult compared to older technologies like 28nm or 14nm. This complexity creates an ideal use case for generative AI and LLM-based solutions.
Samsung’s semiconductor manufacturing environment generates massive amounts of sensor data. The primary data type discussed is “trace data”—time-series readings captured every second from equipment sensors. For example, during an etching process, sensors monitor reflected power and forward power as plasma is initiated, maintained, and terminated. Even a simple two-step etch process generates complex signals that are difficult to monitor manually.
The conventional approach involves capturing snapshots of specific process steps and computing summary statistics (average, standard deviation, min, max, initial slope, stability). More sophisticated methods involve windowing—observing only specific time periods rather than full steps. However, all these approaches historically required extensive manual effort and deep domain expertise to configure properly.
Beyond trace data, the production environment includes:
Samsung has already deployed LLM-based systems in production environments. One notable example is an engineering assistant that helps technicians diagnose equipment issues. When an engineer encounters an error, they can prompt the system with questions like “I got error ABC, what should I do?” The system responds with historical context about similar errors and provides actionable suggestions based on past resolutions.
This system is described as truly multimodal, utilizing images, text, and graphs. The speaker emphasizes this is “not the future—we already have this kind of service in place.” The system leverages RAG (Retrieval-Augmented Generation) architecture to ground LLM responses in actual historical data and documentation.
However, the speaker candidly acknowledges ongoing challenges with hallucination and incorrect answers, noting that they are “suffering from the hallucination” but continue working to “make the best out of it utilizing RAG.”
A significant production challenge in semiconductor manufacturing involves sensor naming inconsistencies across equipment. Different chambers may have the same type of sensor but with different naming conventions. From an equipment engineer’s perspective, they care about whether “Chamber A, Sensor A” behaves correctly compared to “Chamber B, Sensor A” (named differently). From a wafer perspective, you need to understand that these are essentially the same sensor type performing similar functions.
Previously, this required strict naming discipline enforced through training and penalties. Even with 99% accuracy, given hundreds of chambers and thousands of steps, errors accumulated significantly. Samsung now uses knowledge graphs combined with LLMs to automatically align and reconcile sensor naming, handling the inevitable human errors gracefully.
A concrete example demonstrated the power of this approach: when equipment signals suddenly changed after maintenance, engineers couldn’t identify any obvious differences through conventional analysis. Using knowledge graph community detection, Samsung identified that the network relationships between sensors had changed subtly. This led to discovering that a membrane outside the reaction chamber had slightly altered values—something no one had previously considered examining.
Samsung is actively evaluating time series foundation models for predicting equipment behavior. They’ve tested multiple models including Patch TST (from IBM), TimeLLM, and TimeGPT, applying them to their trace data.
The results reveal significant challenges for production deployment:
However, the speaker was explicit that these models are not yet production-ready for control applications. Two major obstacles persist:
Most critically, for process control, Samsung needs accurate predictions at transition points and specific windows. The correlation R-squared values max out at 0.8 but are typically below 0.5 in the critical regions—insufficient for reliable process control.
Samsung is collaborating with IBM, AI-tomatik, and other companies to improve these models, potentially providing semiconductor-specific training data to help foundation model developers tune their systems for microsecond-level industrial time series data.
OES generates three-dimensional data: wavelength (x-axis), time (y-axis), and intensity (z-axis). This shows plasma composition changes during processing—different chemical peaks (N2, O2, etc.) indicate what’s happening inside the chamber. Conventional endpoint detection (EPD) requires pre-determining which specific wavelength peaks to monitor.
With generative AI, Samsung can process the entire 3D OES data and make decisions dynamically, rather than relying on predetermined peaks. This is significant because optimal indicator peaks may vary during processing, and traditional approaches cannot adapt mid-process.
The system also helps with peak identification. While humans can visually spot peaks, automated algorithms require constant updating and aren’t always accurate. In one case, AI detected very small peaks next to expected peaks that turned out to indicate a gas leak—a subtle contamination that would have been ignored manually but was caught through AI analysis, preventing further losses.
For wafer inspection, the system processes wafer map images showing suspected defects (marked as red dots on the wafer surface). Traditionally, determining defect types required physically zooming into individual defects—a time-consuming process that could only cover a sample of potentially thousands of defects per wafer.
Samsung’s multimodal approach enables classification based on the wafer-level pattern distribution, potentially saving significant time and resources. For yield analysis, engineers can prompt the system with questions about error patterns, asking whether similar issues occurred previously and what the root causes were.
Samsung has implemented reinforcement learning for equipment control, specifically for RF (radio frequency) matching in plasma processes. Traditional matching systems only use information from outside the reaction chamber, but Samsung’s RL approach incorporates sensor data from inside the chamber.
Results showed dramatic improvement: conventional matching required many iterations to achieve optimal impedance matching, while RL-based matching achieved results in just 5-6 iterations. The speaker expressed enthusiasm about applying this approach to broader process control beyond just RF matching.
Given the challenge of limited training data (few wafers but many input variables), Samsung employs Physics-Informed Neural Networks (PINNs). These incorporate known physical relationships to reduce the effective dimensionality of the learning problem and improve signal-to-noise ratios. This approach helps predict critical parameters like circuit width (CD), profile characteristics, and defect rates.
A significant operational challenge is data format inconsistency across equipment vendors and fabs. The speaker emphasized that without industry-wide standards, each fab (Samsung, TSMC, Intel) develops proprietary data formats, preventing model transferability and collaboration.
Samsung is leading a SEMI industry task force on “Equipment Data Publication” to define standardized data formats. They’re also building private cloud infrastructure where authorized partners can access data for analytics and model development while maintaining data security.
The speaker mentioned collaboration with multiple organizations:
The discussion revealed that the semiconductor industry’s microsecond-level time series data represents a novel challenge for existing foundation models, which were typically trained on different temporal scales. This gap between available pre-trained models and industrial requirements highlights the need for domain-specific model development and fine-tuning.
This case study illustrates several important LLMOps lessons:
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
DoorDash faced challenges in scaling personalization and maintaining product catalogs as they expanded beyond restaurants into new verticals like grocery, retail, and convenience stores, dealing with millions of SKUs and cold-start scenarios for new customers and products. They implemented a layered approach combining traditional machine learning with fine-tuned LLMs, RAG systems, and LLM agents to automate product knowledge graph construction, enable contextual personalization, and provide recommendations even without historical user interaction data. The solution resulted in faster, more cost-effective catalog processing, improved personalization for cold-start scenarios, and the foundation for future agentic shopping experiences that can adapt to real-time contexts like emergency situations.
Ericsson's System Comprehension Lab is exploring the integration of symbolic reasoning capabilities into telecom-oriented large language models to address critical limitations in current LLM architectures for telecommunications infrastructure management. The problem centers on LLMs' inability to provide deterministic, explainable reasoning required for telecom network optimization, security, and anomaly detection—domains where hallucinations, lack of logical consistency, and black-box behavior are unacceptable. The proposed solution involves hybrid neural-symbolic AI architectures that combine the pattern recognition strengths of transformer-based LLMs with rule-based reasoning engines, connected through techniques like symbolic chain-of-thought prompting, program-aided reasoning, and external solver integration. This approach aims to enable AI-native wireless systems for 6G infrastructure that can perform cross-layer optimization, real-time decision-making, and intent-driven network management while maintaining the explainability and logical rigor demanded by production telecom environments.