ZenML

Multi-Agent AI Development Assistant for Clinical Trial Data Analysis

AstraZeneca 2025
View original source

AstraZeneca developed a "Development Assistant" - an interactive AI agent that enables researchers to query clinical trial data using natural language. The system evolved from a single-agent approach to a multi-agent architecture using Amazon Bedrock, allowing users across different R&D domains to access insights from their 3DP data platform. The solution went from concept to production MVP in six months, addressing the challenge of scaling AI initiatives beyond isolated proof-of-concepts while ensuring proper governance and user adoption through comprehensive change management practices.

Industry

Healthcare

Technologies

Overview

AstraZeneca, a global biopharmaceutical company, embarked on an ambitious initiative to leverage AI for accelerating drug development as part of their broader corporate goal to deliver 20 new medicines by 2030. The case study, presented by Rashali Goyle, Senior Director within R&D IT at AstraZeneca, describes the development and deployment of an interactive AI agent called “Development Assistant” designed to help users query and analyze clinical trial data using natural language.

The initiative represents an interesting example of LLMOps in a heavily regulated pharmaceutical environment, where data quality, accuracy, and governance are paramount. What makes this case study notable is the evolution from a simple proof-of-concept to a production multi-agent system, along with the organizational change management practices that accompanied the technical implementation.

Problem Context

AstraZeneca faced several challenges that are common in large pharmaceutical organizations:

The company recognized that traditional BI tools and dashboard approaches, while functional, were not sufficient to unlock the full potential of their data assets for faster, deeper insights.

Technical Solution

Foundation: Drug Development Data Platform (3DP)

Rather than building from scratch, AstraZeneca strategically chose to build their AI solution on top of an existing platform called 3DP (Drug Development Data Platform). This decision was driven by several factors:

This approach of building on existing foundations rather than creating entirely new infrastructure is a pragmatic LLMOps practice that accelerated time-to-production.

Initial Single-Agent Architecture

The first version of Development Assistant was built using a single-agent approach. The speaker demonstrated a simple use case: a data scientist asking “What are the top five countries with the most clinical trial sites? Visualize this in a pie chart.” The system could:

The team intentionally exposed the reasoning steps and SQL queries to enable quality checks and verification—a critical feature for maintaining trust in a pharmaceutical context.

Challenges and Augmentation Strategies

During initial deployment, the team identified key challenges:

Controlled Vocabulary Issues: Life sciences is notorious for acronyms and specialized terminology. The team discovered that augmenting the LLM with appropriate controlled vocabulary significantly improved output quality. For example, terms like “lymphoid leukemia” needed proper terminology mapping to retrieve accurate results.

Metadata Quality: Column labeling in their data products was often inadequate or inconsistent. Improving metadata descriptions became another augmentation strategy to help the LLM generate correct SQL queries.

These augmentation approaches—vocabulary enrichment and metadata enhancement—represent practical RAG-like techniques for improving LLM accuracy in domain-specific applications.

Evolution to Multi-Agent Architecture

As AstraZeneca sought to expand Development Assistant beyond clinical trials to regulatory, quality, and other R&D domains, they encountered limitations with the single-agent approach:

The solution was migrating to Amazon Bedrock’s multi-agent architecture, which introduced:

Supervisor Agent: A coordinating agent that receives user prompts and routes them to appropriate sub-agents based on the query context.

Sub-Agents: Specialized agents for different domains and functions:

This architecture provides flexibility and scalability while addressing the critical issue that the same terminology can mean different things in different domains—the supervisor agent ensures queries are routed to the correct context.

The enhanced system can now provide not just data retrieval but also insights, recommendations, and summarizations. For example, a data science director could ask about screen failures in a study and receive both the raw data and analytical insights about potential issues.

Production Operations and Guardrails

The team emphasized several LLMOps practices for maintaining production quality:

Continuous Validation: Subject matter experts (SMEs) from clinical and other domains actively validate the tool’s outputs. These domain experts confirm that insights match how their personas would actually analyze the data and what conclusions they would draw.

Sprint-Based Testing: Rigorous testing occurs every sprint, with changes benchmarked against previous versions to ensure improvements don’t introduce regressions.

User Trust Building: The tool is designed to be transparent and verifiable, with reasoning steps exposed to users. This is essential in pharmaceutical contexts where decisions can have significant patient safety implications.

Business Integration: Product managers from different business areas were enlisted to use the tool in their actual workflows, starting with small tasks and gradually expanding usage.

Timeline and Results

The project achieved concept to production MVP in approximately six months, which the speaker noted was faster than typical AI initiatives at the company during a period when many projects were stuck in ideation or proof-of-concept phases. The key factors enabling this speed included:

The tool is now in production and actively being expanded to additional domains in 2025.

Organizational Change Management

A significant portion of the presentation focused on the human side of LLMOps—specifically addressing change fatigue. AstraZeneca implemented several practices:

Multi-Stakeholder Alignment: Collaboration across HR, legal, business groups, and AI accelerator teams to ensure consistent narratives and correct practices.

Showcases and Spotlights: Regular forums to demonstrate new technology to scientists and domain experts, with follow-up sessions to address adoption challenges.

AI Accreditation Program: A four-tier certification program driven from senior leadership, rewarding employees who complete AI-related curriculum. This creates structured pathways for upskilling across the organization.

Lifelong Learning Culture: Senior leaders are making AI learning part of their goals and daily routines, modeling the behavior expected throughout the organization.

Critical Assessment

While the case study presents an impressive initiative, a few observations merit consideration:

The presentation is somewhat light on specific metrics or quantitative results—we hear that tasks that “normally take hours” are now faster, but precise efficiency gains aren’t provided. Similarly, accuracy rates, hallucination frequencies, and user adoption numbers aren’t disclosed.

The multi-agent architecture using Amazon Bedrock is presented as a solution to scalability challenges, but the complexity of managing multiple specialized agents in a production environment—including keeping them synchronized as data products evolve—isn’t fully addressed.

The emphasis on change management suggests that user adoption remains an ongoing challenge, which is realistic but also indicates the tool’s value proposition may not be immediately obvious to all potential users.

That said, the pragmatic approach of building on existing infrastructure, the transparency in reasoning steps, and the recognition that vocabulary and metadata augmentation are critical success factors all represent sound LLMOps practices that other organizations could learn from.

Key Takeaways

The Development Assistant initiative at AstraZeneca demonstrates several LLMOps best practices:

The case represents a practical example of how pharmaceutical companies can apply LLMs to accelerate drug development while managing the unique challenges of heavily regulated, terminology-rich domains.

More Like This

Agentic AI Platform for Clinical Development and Commercial Operations in Pharmaceutical Drug Development

AstraZeneca 2025

AstraZeneca partnered with AWS to deploy agentic AI systems across their clinical development and commercial operations to accelerate their goal of delivering 20 new medicines by 2030. The company built two major production systems: a Development Assistant serving over 1,000 users across 21 countries that integrates 16 data products with 9 agents to enable natural language queries across clinical trials, regulatory submissions, patient safety, and quality domains; and an AZ Brain commercial platform that uses 500+ AI models and agents to provide precision insights for patient identification, HCP engagement, and content generation. The implementation reduced time-to-market for various workflows from months to weeks, with field teams using the commercial assistant generating 2x more prescriptions, and reimbursement dossier authoring timelines dramatically shortened through automated agent workflows.

healthcare regulatory_compliance document_processing +34

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Deploying Secure AI Agents in Highly Regulated Financial and Gaming Environments

Sicoob / Holland Casino 2025

Two organizations operating in highly regulated industries—Sicoob, a Brazilian cooperative financial institution, and Holland Casino, a government-mandated Dutch gaming operator—share their approaches to deploying generative AI workloads while maintaining strict compliance requirements. Sicoob built a scalable infrastructure using Amazon EKS with GPU instances, leveraging open-source tools like Karpenter, KEDA, vLLM, and Open WebUI to run multiple open-source LLMs (Llama, Mistral, DeepSeek, Granite) for code generation, robotic process automation, investment advisory, and document interaction use cases, achieving cost efficiency through spot instances and auto-scaling. Holland Casino took a different path, using Anthropic's Claude models via Amazon Bedrock and developing lightweight AI agents using the Strands framework, later deploying them through Bedrock Agent Core to provide management stakeholders with self-service access to cost, security, and operational insights. Both organizations emphasized the importance of security, governance, compliance frameworks (including ISO 42001 for AI), and responsible AI practices while demonstrating that regulatory requirements need not inhibit AI adoption when proper architectural patterns and AWS services are employed.

healthcare fraud_detection customer_support +50