Cambrium is using LLMs and AI to design and generate novel proteins for sustainable materials, starting with vegan human collagen for cosmetics. They've developed a protein programming language and leveraged LLMs to transform protein design into a mathematical optimization problem, enabling them to efficiently search through massive protein sequence spaces. Their approach combines traditional protein engineering with modern LLM techniques, resulting in successfully bringing a biotech product to market in under two years.
Cambrium is a three-year-old biotech startup based in Berlin that uses AI to design proteins which become sustainable materials. Founded in September 2020, the company has achieved what is reportedly the fastest biotech product commercialization in EU history—bringing their first product, NovaColl (a vegan human collagen for cosmetics), to market in under two years. Pierre, the Head of Engineering at Cambrium (and employee number two), shares insights into how the company applies LLMs and generative AI in a production biotechnology context, offering a fascinating case study of LLMOps in a domain very different from typical software applications.
Cambrium’s engineering team operates under constraints quite different from typical ML/AI companies. Unlike teams focused on low-latency serving or high QPS (queries per second), Cambrium’s primary customers are internal scientists—researchers they work alongside and “drink beer with.” This fundamentally changes the operational requirements. Their services don’t need to handle millions of customers or maintain 24/7 uptime with millisecond response times.
However, they face a unique constraint that dominates their engineering priorities: the extraordinarily high cost of biological experiments. Every data point is precious because DNA handling, laboratory equipment, consumables, and the experiments themselves are extremely expensive. This creates an unusual LLMOps priority: data integrity and pipeline reliability are paramount. As Pierre explains, if a data pipeline fails and data “disappears in the cloud limbo,” it’s actually quite dramatic because recreating that experimental data is prohibitively costly.
The cost of prediction errors in their ML models is relatively low compared to domains like healthcare diagnostics—if a protein doesn’t perform as expected, the material simply doesn’t work as intended, resulting in financial loss but no human safety issues. This risk profile allows for more experimental approaches to model development while still demanding extreme care with data handling.
Cambrium has deployed an LLM-based assistant built on a LangChain stack with retrieval-augmented generation (RAG) capabilities. This internal tool serves as a sophisticated search engine that can query across multiple data sources:
The assistant can answer questions ranging from simple tasks like generating LinkedIn posts about recent products to more complex queries like identifying the latest high-performing protein sequences produced in the lab. When asked about their latest product, it correctly retrieves information showing the product is NovaColl and when it was released.
Pierre describes this as a “glorified search engine” but emphasizes its practical utility for researchers who need quick access to information scattered across multiple internal systems. Notably, Pierre expresses frustration with the market hype around similar RAG products, suggesting that while the LangChain stack is well-engineered, the business value of such applications is sometimes overstated by companies raising significant funding for essentially the same technical approach.
Before the LLM boom, Cambrium developed what they call a “protein programming language”—a formally compiled programming language for specifying protein designs. This represents an early approach to bridging natural language concepts with computational protein design. The language allows scientists to specify:
This language transforms protein design into a mathematical optimization problem, enabling the team to sift through approximately 10^18 possible protein sequences to identify top candidates for laboratory testing. The NovaColl product was developed using this approach, and Pierre credits this computational capability with enabling them to “one-shot” regulatory feasibility checks and scalability validations.
For their next products (details confidential), Cambrium is using more advanced LLM-era techniques including diffusion models for protein generation. Pierre describes being able to “see the protein coming out of like a blur of atoms and folding together”—using diffusion-style generative approaches where proteins gradually take shape with properties defined by mathematical loss functions.
This represents a shift from the constraint-based programming language approach to true generative AI for proteins. The underlying principle is that proteins, like natural language, are sequences of discrete units (amino acids vs. letters) with complex “grammar” and “syntax” that determine whether a sequence has meaningful function. Random amino acid sequences typically produce non-functional “spaghetti ball” structures, but understanding the structural and functional grammar allows for generating sequences with specific properties.
Cambrium deals with two primary data source types, each presenting different challenges:
Robot/Machine Data: Lab instruments produce “ugly CSVs” that need parsing—relatively straightforward engineering challenges.
Scientist Experimental Notes: This is more challenging. Biology curricula have been slow to adopt data-driven and digitalization practices, meaning scientists often don’t naturally think in data-first terms. Training scientists with the right mindset for data-driven experimentation is essential but has a non-trivial learning curve.
The company addresses this by:
Given the high cost of experimental data, pipeline reliability is critical. Data loss is treated as a serious incident because unlike web applications where you might just lose some user analytics, losing experimental data means potentially having to repeat expensive laboratory work. This creates a strong emphasis on robust data engineering practices even though the team doesn’t need the scale-focused infrastructure of consumer-facing applications.
Pierre emphasizes that Cambrium’s position as a startup building “something from nothing” gave them an advantage—they could hire the right people with data-driven mindsets from the start. Larger, established biotech companies face the enormous challenge of digitalizing decades of paper notebooks and retraining hundreds of scientists with established habits. This is a significant competitive advantage for smaller, newer companies in the biotech space.
A key insight shared is the deep similarity between protein sequences and natural language that makes LLM-style approaches applicable to protein design:
This analogy isn’t just theoretical—it’s driving practical applications at Cambrium where they’re applying generative AI techniques originally developed for language to create novel proteins with designed properties.
The broader strategic context shapes Cambrium’s LLMOps approach. They started with cosmetics because:
Their long-term vision includes working toward sustainable materials that could eventually replace plastics and other commodity chemicals. The computational capabilities—including LLM-based approaches—are essential to de-risking the expensive experimental pathway from high-value specialty chemicals to lower-margin commodity materials.
This case study offers several valuable perspectives for LLMOps practitioners:
The emphasis on data pipeline reliability, scientist workflow integration, and experimental de-risking through AI represents a distinctive LLMOps pattern for research-intensive organizations where the cost of data generation dominates other operational concerns.
OpenAI's Forward Deployed Engineering (FDE) team, led by Colin Jarvis, embeds with enterprise customers to solve high-value problems using LLMs and deliver production-grade AI applications. The team focuses on problems worth tens of millions to billions in value, working with companies across industries including finance (Morgan Stanley), manufacturing (semiconductors, automotive), telecommunications (T-Mobile, Klarna), and others. By deeply understanding customer domains, building evaluation frameworks, implementing guardrails, and iterating with users over months, the FDE team achieves 20-50% efficiency improvements and high adoption rates (98% at Morgan Stanley). The approach emphasizes solving hard, novel problems from zero-to-one, extracting learnings into reusable products and frameworks (like Swarm and Agent Kit), then scaling solutions across the market while maintaining strategic focus on product development over services revenue.
This panel discussion at AWS re:Invent features three companies deploying AI models in production across different industries: Somite AI using machine learning for computational biology and cellular control, Upstage developing sovereign AI with proprietary LLMs and OCR for document extraction in enterprises, and Rambler AI building vision language models for industrial task verification. All three leverage AMD GPU infrastructure (MI300 series) for training and inference, emphasizing the importance of hardware choice, open ecosystems, seamless deployment, and cost-effective scaling. The discussion highlights how smaller, domain-specific models can achieve enterprise ROI where massive frontier models failed, and explores emerging areas like physical AI, world models, and data collection for robotics.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.