## Overview
BenchSci is a life sciences AI company founded in 2015 that has developed ASCEND, an AI platform designed to serve as a research assistant for scientists working in drug discovery and pharmaceutical R&D. The company has raised over $200 million in funding and works with over 75% of the top 20 pharmaceutical companies globally, with approximately 50,000 scientists using their platform. This case study, presented at a Google Cloud event, details how BenchSci leverages LLMs in production to help accelerate the notoriously slow and expensive preclinical drug discovery process.
The problem BenchSci addresses is significant: drug discovery currently takes 8-14 years to bring a drug to market, costs over $2 billion, and has a 90% failure rate in clinical trials. Only 5-6% of drug discovery projects reach clinical trials, and even then, most fail because results from animal studies don't translate to humans. The fundamental challenge is that disease biology is extraordinarily complex, with approximately four trillion relationships in the human body, and while scientists generate massive amounts of information, converting that into actionable knowledge has proven difficult.
## Technical Architecture and LLMOps Approach
### Data Foundation
BenchSci's approach begins with building a comprehensive data foundation through partnerships with major scientific publishers. They have aggregated hundreds of data sources including open access research, paywalled content, and internal pharmaceutical company data to collect the entire landscape and history of biomedical research. This data foundation represents nearly a decade of work and is critical to the platform's effectiveness.
The company has created two core assets that drive their AI capabilities:
- **Evidence Map**: A comprehensive repository of all evidence and insights ever generated in scientific research, structured and connected to show relationships between biological entities.
- **Ontology Knowledge Base**: A specialized system that handles the significant challenge of scientific nomenclature, where entities can have 20-30 different names and complex relationships exist between genes, diseases, pathways, and biological processes.
### Domain-Specific LLMs
Rather than relying solely on general-purpose foundation models, BenchSci invested in building domain-specific LLMs tailored to biomedical research. They developed specialized models for both vision (understanding scientific figures, diagrams, and images) and text analysis that mimic how scientists would understand information but at scale. Additionally, they built smaller models to generate derivative insights in the same manner scientists would approach such analysis.
This approach reflects a critical LLMOps consideration: in highly specialized domains, generic foundation models often lack the precision and domain expertise required for production use cases. By training or fine-tuning models specifically for biomedical contexts, BenchSci achieves higher accuracy in understanding scientific literature and generating relevant insights.
### Scientific Veracity and Human-in-the-Loop
A notable aspect of BenchSci's approach is their emphasis on scientific veracity. The company maintains approximately a 1:1 ratio between engineers/ML scientists and PhD-level biologists, with over 100 scientists on staff. This reflects the reality that in production AI systems for specialized domains, domain expertise must be deeply integrated into the development process.
Scientists at BenchSci work side-by-side with engineers throughout the development process to ensure machine learning model outputs are scientifically and biologically accurate. This human-in-the-loop approach is essential because their end users—scientists at major pharmaceutical companies—are trained to be skeptical and will not adopt technology that doesn't demonstrate sound scientific reasoning behind its outputs.
### RAG Architecture and Generative AI Integration
BenchSci employs a RAG (Retrieval-Augmented Generation) architecture that creates a "symbiotic relationship" between their structured data (cleaned and connected over nearly a decade) and generative AI capabilities. The key generative AI features they leverage include summarization and conversational interfaces, but critically enhanced with explainability and limited hallucination.
The emphasis on explainability is particularly important for their user base. Scientists require transparency into the evidence and reasoning behind any AI-generated output. Without this, adoption would be limited. This represents a key production consideration for LLM deployments in high-stakes domains: the ability to trace outputs back to source evidence is not optional but essential.
### Google Med-LM Partnership
A significant development highlighted in the presentation is BenchSci's partnership with Google to utilize Med-LM, a foundation model pre-trained on medical data. The hypothesis was that a foundation model trained on data closer to the biomedical domain would outperform generic foundation models and potentially solve problems that weren't previously addressable.
This partnership illustrates an important LLMOps pattern: leveraging specialized foundation models as a base rather than starting from generic models. The domain-specific pre-training in Med-LM provided a stronger foundation for BenchSci's biomarker identification use case, demonstrating that model selection and evaluation against domain-specific benchmarks is a critical consideration in production deployments.
## Production Use Case: Biomarker Identification
The specific production use case detailed is biomarker identification. Biomarkers are critical in drug discovery for several reasons:
- Understanding disease progression and mechanisms
- Ensuring that preclinical work in animal studies translates to humans
- Designing appropriate endpoints for clinical trials to determine drug efficacy
BenchSci applied foundation models, specifically Med-LM, to identify biomarkers in the same manner scientists would, but at scale. The results reported from scientists at "one of the biggest Pharma companies in the world" showed a 40% increase in productivity. Processes that previously took months were reduced to just a few days.
While these are impressive claims, it's worth noting that they come from the company's own presentation and should be interpreted with appropriate context. The 40% productivity improvement is significant, but the specific methodology for measuring this improvement is not detailed. Nevertheless, reducing month-long processes to days represents a substantial efficiency gain if validated.
## Enterprise Considerations
BenchSci emphasizes that their platform is "Enterprise ready," which is crucial given their deployment with some of the world's largest pharmaceutical companies. These organizations have rigorous security, compliance, and audit requirements. While specific details aren't provided, the mention of extensive scrutiny during deployments suggests significant investment in security, access controls, data governance, and compliance frameworks.
The enterprise deployment context adds complexity to LLMOps that differs from consumer applications. Pharmaceutical companies deal with proprietary research data, regulatory requirements, and intellectual property considerations that require robust data handling and access management.
## Key LLMOps Lessons
Several important LLMOps principles emerge from this case study:
**Domain expertise integration**: In specialized fields, having domain experts embedded in the AI development process is essential for ensuring model outputs are accurate and trustworthy. The 1:1 ratio of engineers to scientists at BenchSci represents a significant investment in this principle.
**Explainability as a requirement**: For users trained in scientific skepticism, black-box AI is insufficient. The RAG architecture's ability to trace outputs back to source evidence addresses this need.
**Specialized foundation models**: Rather than forcing generic models to work in specialized domains, leveraging domain-specific foundation models (like Med-LM) can unlock capabilities that weren't previously achievable.
**Long-term data investment**: BenchSci's nearly decade-long investment in building their evidence map and ontology knowledge base underscores that AI platforms in specialized domains often require substantial upfront investment in data infrastructure.
**Hallucination mitigation**: The explicit focus on "limited hallucination" acknowledges this as a critical concern for production AI systems, especially in domains where incorrect information could have serious consequences.
The case study presents BenchSci's approach as highly successful, though viewers should note this is a promotional presentation at a Google Cloud event. The claimed results are impressive but come from the company itself rather than independent verification. Nevertheless, the technical approach—combining structured domain knowledge with RAG architecture and domain-specific foundation models—represents a thoughtful production architecture for LLMs in specialized scientific domains.