## Overview
This case study represents a comprehensive panel discussion from AWS re:Invent featuring three distinct companies deploying large language models and AI systems in production environments across multiple industries. The panel includes Arjun Raj from Somite AI (computational biology), Kasey from Upstage (enterprise document AI with sovereign considerations), and Roman Hasenback from Rambler AI (vision language models for industrial applications). The discussion is moderated by an AMD representative and focuses on practical LLMOps considerations including hardware choice, deployment strategies, model sizing, cost optimization, and the transition from proof-of-concept to production.
## Somite AI - Computational Biology Use Case
Arjun Raj, head of computational biology at Somite and professor at the University of Pennsylvania, describes an LLMOps challenge at the intersection of biology and AI. Somite's mission centers on understanding how to control cells by decoding the biological "language" that directs cellular behavior. The analogy used is powerful: biology has identified the tokens (individual signals that can be given to cells) but hasn't figured out how to string together sentences (complex instructions to direct cellular differentiation, such as turning stem cells into muscle cells for therapeutic purposes).
The company generates massive amounts of novel biological data in laboratory settings, specifically designed to build predictive models. These models aim to solve the fundamental problem of cellular control using modern machine learning techniques powered by AMD GPUs. The complexity barrier that biology hit around the year 2000 is particularly relevant here—the field moved from simple linear causal chains to incredibly complex networks where outcomes depend on hundreds or thousands of variables. Traditional human analysis couldn't extract predictive power from this complexity, but machine learning models can.
From an LLMOps perspective, Somite's work represents a domain where large memory capacity and high bandwidth are critical infrastructure requirements. The AMD MI300 series GPUs with 288GB of HBM (high bandwidth memory) enable researchers to work with large biological datasets in single GPU configurations, facilitating faster iteration and discovery. The team's diversity—spanning software engineers, computational biologists, bioinformaticians processing raw sequencing data, and experimental scientists generating data—creates infrastructure requirements centered on flexibility and seamless operation. Resources are stretched thin across multiple competencies, so systems must "just work out of the box" without requiring extensive infrastructure management expertise.
Arjun emphasizes that AMD functions as a partner rather than a vendor, handling infrastructure details so the team can focus on their core scientific competencies rather than becoming GPU infrastructure experts. This partnership model is crucial for research-intensive organizations where technical talent is specialized and infrastructure friction directly impacts research velocity.
## Upstage - Enterprise Document AI with Sovereign Considerations
Kasey from Upstage presents a fascinating enterprise LLMOps case study focused on what he calls "unsexy" but mission-critical AI work—document extraction and workflow automation for highly regulated industries including financial services, healthcare, and public sector organizations. Upstage started five years ago in Korea and conducted extensive customer research, meeting with CEOs of major corporations like Samsung, LG, and Hyundai to identify AI use cases with high willingness to pay.
The unanimous answer was document extraction, specifically next-generation OCR (optical character recognition). Traditional OCR reads documents left-to-right without understanding context or layout, failing when documents have layout variations, handwriting, crumpling, or tilting. This lack of robustness undermines automation value when accuracy falls below 95%. Upstage developed proprietary next-generation OCR combined with their proprietary LLM called Solar to enable template-free extraction. Users can request extraction of specific fields (name, ID, address) from 500-page documents, and the system can locate, identify, validate, and extract information dynamically in a fast and cost-efficient manner.
From an LLMOps perspective, Upstage's journey reflects the broader enterprise AI maturation curve observed across the industry. Two years ago when generative AI created initial excitement, enterprises rushed to POCs using massive frontier models from OpenAI or Anthropic, typically deployed on major cloud providers with high-compute GPUs. These models couldn't be deployed on-premises or in private cloud environments, forcing reliance on third-party APIs. While many POCs showed promising results, an MIT report from summer 2025 indicated that over 90% of enterprise POCs failed to reach production. The primary barrier was ROI—the technology worked, but costs were prohibitive.
This led to a fundamental shift in the LLMOps conversation within enterprise contexts. Upstage demonstrates that for tedious, specific workflows enterprises need to automate, massive frontier models are unnecessary. Smaller, domain-specific models can achieve equivalent accuracy and performance at a fraction of the cost. Upstage's OCR model has fewer than 1 billion parameters, while their Solar LLM has 17 billion parameters. These fit comfortably on a single GPU, achieving the same accuracy as much larger models while reducing hardware costs by 90%. This economic equation transforms POCs into productizable solutions.
The infrastructure evolution discussion reveals how rapidly the enterprise LLMOps landscape changed. Within a single year, conversations shifted from POC exploration to production deployment with clear ROI focus. Enterprises now follow a pattern: conduct POCs to understand use cases and test capabilities, identify smaller models that can be deployed efficiently on reduced GPU infrastructure (or even CPUs for certain workloads), deploy in production, and then scale to multiple use cases. This approach achieves greater efficiency in both hardware and software costs.
Upstage positions their Solar LLM as "sovereign AI for Korea," emphasizing data sovereignty as core to their identity. When approaching AI as a strategic national asset requiring control and management, dependency on a single infrastructure provider becomes a risk. Upstage explicitly pursues multi-cloud and multi-GPU vendor strategies for system stability, treating AI infrastructure with the same redundancy considerations as cloud infrastructure. They ensure AMD GPUs are in the mix alongside other GPU providers to hedge risks at both macro and micro levels.
Critically, Upstage's engineering team found the transition to AMD hardware seamless, which validated their multi-vendor strategy without introducing operational complexity. The training of their proprietary Solar LLM on AMD GPUs demonstrates that enterprises can maintain infrastructure sovereignty and flexibility while still achieving competitive model performance. The open ecosystem and open-source nature of AMD's software stack provides additional value for sovereign AI strategies—every line of code can be audited, built, and owned, enabling air-gapped deployments with full transparency about what's in the stack.
## Rambler AI - Vision Language Models for Industrial Applications
Roman Hasenback, CEO and co-founder of Rambler AI, presents a third distinct LLMOps use case centered on vision language models for physical AI applications. Rambler built an end-to-end platform to train and deploy vision AI agents, drawing on Roman's background at the intersection of the real world and technology, including previous work that became the foundation for Apple's ARKit and Vision Pro.
The core problem Rambler addresses is enabling AI to develop granular understanding of the real world to support humans in daily life and industrial work environments like manufacturing plants. Their system collects data from the real world showing how tasks are performed, fine-tunes and trains vision language models on understanding proper task execution, and deploys these models to verify whether tasks were performed according to standard operating procedures or if deviations occurred.
The fundamental technology is video understanding at a granular level. Rambler built an end-to-end system for this capability while remaining hardware-agnostic, though they leverage AMD GPUs for training and deployment. The LLMOps journey for vision language models has been remarkably compressed—Rambler started in 2021 when vision language models didn't exist. The research and development velocity in this space has been unprecedented, and current capabilities would have seemed implausible just a few years ago.
A relatively small model with 7 billion parameters running on edge devices can now extract granular details about how humans and robots interact with the world. This enables applications throughout work hours and daily life, with consumer devices like Meta Ray-Ban 3 representing early adoption. The increasing deployment of cameras in various environments creates opportunities but also significant workload challenges—processing video frames requires substantial computational power. While CPUs can't handle these workloads alone, powerful GPUs enable real-world physical AI applications.
From an LLMOps perspective, Rambler emphasizes that foundation models were trained on internet data not representative of real-world encounters. Domain-specific fine-tuning and training on proprietary data remains essential for accurate video stream analysis. This creates a critical bottleneck: the availability of high-quality, open datasets. Roman advocates strongly for the community to contribute to open, high-quality datasets with proper annotations and labels, particularly outside traditional research contexts. Industrial sector data contains proprietary information, creating reluctance to share, but this is stalling progress on model generalization capabilities.
Rambler is working on open-source datasets to counteract this trend, though as a smaller company they can only contribute so much. The broader community needs to participate in building data pools that can unlock next-generation model capabilities. This reflects a key LLMOps challenge: model performance ultimately depends on training data quality and diversity, particularly for physical AI applications where real-world task understanding is critical.
## Hardware Infrastructure and Multi-Vendor Strategy
A recurring theme across all three case studies is the strategic importance of hardware choice and flexibility in production LLMOps environments. AMD's MI300 series GPUs feature prominently across all three deployments, with specific emphasis on capacity and bandwidth advantages. The MI300 offers 288GB of HBM, providing leadership in memory capacity 1-2 years ahead of competition. For workloads involving large biological datasets, extensive document processing, or continuous video stream analysis, memory capacity and bandwidth become as important as raw computational throughput.
The panel discussion emphasizes AMD's commitment to zero-friction transitions from incumbent ecosystems—no code changes should be required to evaluate and deploy on AMD hardware. This mirrors the CPU experience where AMD established seamless compatibility. The open ecosystem and open-source software stack provides additional advantages: every line of code can be audited and built by customers, enabling sovereign AI strategies with full transparency and control. Air-gapped deployments are possible without concerns about proprietary binary blobs.
The hardware evolution extends beyond data center GPUs to edge devices. The AMD RYZEN Max 395 laptop with 128GB of RAM can run models like GPT-OSS 120B fully locally, enabling intelligence at the edge while infrastructure starts from data center deployments on MI300. This continuum from cloud to edge represents an important LLMOps capability as models get compressed and optimized for deployment in resource-constrained environments.
## The POC-to-Production Challenge
A critical insight emerging from the Upstage discussion is the systematic failure of enterprise AI POCs to reach production, with MIT research indicating over 90% failure rates. This represents a fundamental LLMOps challenge that goes beyond technical capability to economic viability. The pattern that emerged: enterprises would conduct successful POCs using massive frontier models on high-cost GPU infrastructure, demonstrate impressive capabilities, then discover that production deployment costs made ROI impossible.
The solution involved a fundamental reconceptualization of model requirements. For specific enterprise workflows—document extraction, workflow automation, domain-specific analysis—frontier models are overkill. Smaller models with 1-20 billion parameters, properly trained on domain-specific data, can achieve equivalent accuracy at 10% of the infrastructure cost. This insight transformed the enterprise LLMOps landscape within a single year, shifting focus from "what's technically possible" to "what's economically sustainable at scale."
This pattern has broader implications for LLMOps practice. The industry narrative often emphasizes larger models with more parameters, but production reality frequently demands the opposite: the smallest model that achieves required performance metrics at sustainable cost. This drives focus toward model optimization, quantization, distillation, and domain-specific fine-tuning—techniques that maintain capability while dramatically reducing deployment costs.
## Partnership Model vs. Vendor Relationship
Multiple panelists emphasized the distinction between hardware vendors and infrastructure partners. For organizations with diverse technical teams and stretched resources, infrastructure must "just work" without requiring deep specialization in GPU management, cluster orchestration, or low-level optimization. When issues arise, responsive technical support that handles details enables teams to focus on their core competencies—biological research, enterprise application development, or computer vision model training.
AMD's venture arm also plays a role in this partnership model, providing early-stage support for companies like Upstage before they scaled. This combination of technical support, leadership engagement, and venture investment creates deeper relationships than transactional hardware purchases. For startups and research organizations, this support is particularly valuable during scaling phases when infrastructure challenges can become bottlenecks to progress.
## Emerging Areas and Future Directions
The panel discussion touched on several emerging areas representing future LLMOps challenges:
**Physical AI and World Models**: Real-time world models that deeply understand physics and real-world dynamics represent an ambitious goal. Current vision language models require extensive domain-specific fine-tuning because foundation training data (internet images and videos) doesn't represent real-world diversity. True world models would generalize across contexts with minimal fine-tuning. This requires massive improvements in training data quality and diversity, particularly for physical interactions, manufacturing processes, and domain-specific environments.
**Data Collection for Robotics**: Training robust physical AI models requires systematic collection of human interaction data showing how people handle edge cases, adapt to challenging situations, and execute tasks with nuanced understanding. Current approaches include instrumented data collection from human demonstrations and reinforcement learning systems that explore task spaces independently. Companies focusing on systematic data collection infrastructure could unlock major advances in robotics and physical AI capabilities.
**AI Scientists and Autonomous Research**: The vision of AI systems conducting autonomous scientific research represents an ambitious application of LLMOps at scale. This would involve robots physically conducting experiments (addressing the reinforcement learning challenge of grounding learning in real-world outcomes), processing vast scientific literature (extracting knowledge from "ancient photocopies of faxes of PDFs"), and generating and testing hypotheses. While robot manipulation remains a limitation, the potential for AI to accelerate scientific discovery by processing and integrating knowledge at unprecedented scale is compelling.
**Edge Deployment and Model Compression**: The progression from data center training on large GPU clusters to edge deployment on laptops or mobile devices represents a critical LLMOps capability. Techniques like quantization, pruning, and distillation enable models trained on hundreds of GPUs to run efficiently on single devices, bringing intelligence to where it's needed without constant connectivity to cloud infrastructure. This edge deployment capability becomes particularly important for sovereign AI strategies and scenarios requiring data privacy or low-latency responses.
## Open Source and Ecosystem Considerations
The AMD ecosystem's emphasis on open source carries specific implications for production LLMOps. Transparency in software stacks enables security auditing, compliance verification, and deep customization not possible with proprietary systems. For regulated industries (healthcare, finance, government), the ability to audit every component of the AI infrastructure stack provides confidence that security and compliance requirements are met.
The open ecosystem also accelerates innovation by enabling community contributions and allowing researchers to understand and modify underlying systems rather than treating them as black boxes. This transparency extends from low-level GPU libraries through training frameworks to deployment tools, creating flexibility in how organizations build their LLMOps infrastructure.
High-quality open datasets and open-source foundation models provide crucial baselines that organizations can build upon with proprietary modifications for specific use cases. Rather than starting from scratch, teams can fine-tune open models on their domain-specific data, dramatically reducing time-to-production and development costs. This approach requires that open datasets and models be genuinely high-quality rather than just available—a challenge the community continues to address.
## Cost Optimization and Economic Sustainability
The economics of LLMOps emerged as a central theme across the panel. The Upstage experience demonstrates how the industry learned that technical feasibility doesn't guarantee economic viability. Infrastructure costs for large model deployment can exceed the value created, particularly for enterprise use cases where margins are constrained.
The shift toward smaller, optimized models running on fewer GPUs (or on CPUs for certain workloads) represents a maturing understanding of production economics. A 17-billion parameter model on a single GPU costing 10% of a frontier model deployment creates fundamentally different ROI calculations. This economic pressure drives technical innovation in model compression, efficient architectures, and deployment optimization—areas that become increasingly central to LLMOps practice as AI moves from research to production at scale.
## Conclusion
This multi-company panel provides rich insight into production LLMOps across diverse use cases: computational biology, enterprise document automation, and industrial vision AI. Common themes emerge around hardware flexibility and choice, the importance of open ecosystems, the economics of model sizing and optimization, and the transition from POC to production. The emphasis on partnership models rather than vendor relationships, the strategic importance of sovereign AI considerations for regulated industries, and the emerging challenges in physical AI and robotics data collection paint a picture of LLMOps as a rapidly maturing discipline where technical capability must be balanced with economic sustainability and operational pragmatism. The discussion reveals how organizations deploying AI in production navigate hardware choices, model optimization, infrastructure management, and scaling challenges while maintaining focus on delivering value in their specific domains.