Company
Novartis
Title
AI-Driven Clinical Trial Transformation with Next-Generation Data Platform
Industry
Healthcare
Year
2025
Summary (short)
Novartis embarked on a comprehensive data and AI modernization journey to accelerate drug development by at least 6 months per clinical trial. The company partnered with AWS Professional Services and Accenture to build a next-generation, GXP-compliant data platform that integrates fragmented data across multiple domains (including patient safety, medical imaging, and regulatory data), enabling both operational AI use cases and ambitious moonshot projects like a digital twin for clinical trial simulation. The initial implementation with the patient safety domain achieved significant results: 16 data pipelines processing 17 terabytes of data, 72% faster query speeds, 60% storage cost reduction, and over 160 hours of manual work eliminated, while protocol generation use cases demonstrated 83-87% acceleration in generating compliance-acceptable protocols.
## Overview and Business Context Novartis, a global pharmaceutical company, presented a comprehensive case study on their data and AI modernization journey aimed at dramatically accelerating drug development. The presentation, delivered jointly by Anna C. Klebus (leading data, digital and IT for drug development at Novartis), Apoorv Vasoshi (AWS Professional Services), and Costu (AWS), detailed how the company is working to reduce clinical trial development cycles by at least 6 months per trial. The strategic imperative is clear: while drug development traditionally takes 15 years, even a 6-month acceleration can represent the difference between life and death for patients awaiting treatment. The company's AI strategy is grounded in what they call "augmentation" - treating machine learning and artificial intelligence as enhancements to human intelligence rather than replacements. This philosophy permeates their entire approach, from organizational structure to technical architecture. Novartis applies AI across the entire research, development, and commercial (R&D-C) continuum, with particular focus on the development phase which encompasses clinical trial operations. ## Strategic AI Framework and Guiding Principles Novartis adopted three core principles to make their AI approach adaptive to the rapid pace of technological change: **Modularity** is central to their architecture. Rather than building monolithic systems, they designed use cases to be individually valuable but collectively transformative. The modular approach allows for plug-and-play capabilities - if a better commercial solution emerges for protocol design, for example, the architecture permits swapping components without rebuilding the entire ecosystem. This "Lego blocks" analogy extends throughout their technical stack. **Balanced Portfolio Management** addresses the common challenge of mushrooming AI use cases. Novartis maintains a curated portfolio balancing ambitious "moonshot" innovations with more pragmatic "low-hanging fruit" projects like document generation. While document automation may not be transformative alone, these projects build confidence in AI, deliver fast value, and create organizational excitement that enables adoption of more complex initiatives. **Ruthless Prioritization** ensures resources focus on high-value initiatives. The company established a robust value framework with clear ROI metrics for each use case, driving accountability among leaders claiming value delivery. This discipline prevents resource fragmentation and ensures full funding for prioritized projects. ## Moonshot Project: Intelligent Decision System (IDS) The most ambitious AI initiative is IDS, envisioned as a computational digital twin for clinical trials. This system would enable software-based simulation of entire end-to-end clinical trial operational plans, pulling inputs and outputs from relevant domains to support what-if analysis, scenario planning, and optimization. Given that clinical trials span 7-9 years with countless intervention points, a data-driven simulation capability before real-world implementation represents a potential breakthrough in trial design and execution. ## Next-Generation Data Platform Architecture Recognizing that scaling AI from proof-of-concept to production required foundational infrastructure, Novartis partnered with AWS and Accenture to build what they term a "next-generation data platform." This represents a truly integrated, single platform supporting all drug development functions. ### Technical Architecture Components The platform architecture comprises five major white-box components within the AWS ecosystem: **Ingestion Framework**: The platform addresses Novartis's highly heterogeneous data landscape, including file shares (SharePoint, file servers), relational databases, life sciences platforms (Vivavault), and master data management systems (Reltio). Rather than forcing one-size-fits-all or creating bespoke solutions for each source, AWS grouped ingestion capabilities by method - file-based sources use similar ingestion patterns, for instance. A critical differentiator was the systematic use of Architectural Decision Records (ADRs) for each component, documenting pros and cons of technology choices while considering organizational culture, build-versus-buy preferences, existing licenses, cost constraints, and workforce skill sets. This modular approach enables component upgrades without wholesale platform replacement. **Storage and Processing Layer**: The platform implements a data mesh architecture with domain-oriented data products. Each domain (safety, medical imaging, regulatory, etc.) maintains its own AWS accounts across development, QA, test, and production environments. Data products typically flow through three layers: raw ingestion (1:1 copy with potential anonymization), intermediate transformation (creating reusable cross-domain assets), and consumer-specific processing (delivering precisely what end users need without further processing). The technical stack leverages AWS infrastructure with Databricks for data processing jobs. All infrastructure is defined as code, deployed via CI/CD pipelines that automatically provision data products across domain accounts. Notably, the platform supports both analytical data products (for heavy analytics workloads) and operational/transactional use cases, with Amazon RDS providing relational data storage and querying capabilities. **Data Management and Governance**: Often underrated, this layer ensures data appears in the enterprise catalog, maintains data lineage and traceability, and enforces both technical and business data quality rules. Technical data quality validates structural correctness (column data types, for example), while business data quality ensures semantic correctness - a patient age of 40 might be technically valid but violate business constraints for a pediatric trial. The team prioritized simplicity, using out-of-the-box solutions where possible. They adopted an incremental approach: initial data lineage tracked within AWS and Databricks environments, with plans to extend to sources and targets in subsequent phases. Access management controls who can use which data assets. **Data Consumption Experience**: The platform implements a subscription-based data mesh model. Data practitioners search the enterprise data catalog for relevant datasets, request access, and receive approval or rejection from data owners. Approved users enter the "data product experience," accessing business-qualified clean data through visualization tools (QuickSight, Power BI, Tableau) or AI/ML platforms (SageMaker, Bedrock). This enables the generative AI use cases showcased by Novartis leadership, such as protocol draft generation. An operational/relational experience runs parallel for business users who need SQL access via JDBC clients connecting to RDS, though this workflow is more manual than the automated data product subscription. **Central Observability Platform**: All ingestion events, processing jobs, catalog access requests, data quality results, and cost information route to a centralized observability platform. While logs could theoretically remain distributed, centralizing them simplifies dashboard creation and cross-account analysis. This unified view supports comprehensive monitoring across the entire platform. ## Production Implementation and Results The initial production deployment focused on the patient safety domain, which proved ideal for validation despite - or because of - its complexity. This domain handles adverse event data (patient complaints about medication side effects), making it highly sensitive for both patients and the company, and therefore subject to the strictest GXP compliance standards (Good Clinical Practice in this pharmaceutical context). The quantified results from this first domain deployment are substantial: - 16 data pipelines built and deployed in a matter of months - 17 terabytes of data processed - 72% reduction in query execution time - 60% reduction in storage costs - 160+ hours of manual work eliminated through automation These metrics represent only the foundation layer benefits. The protocol generation use cases leveraging this platform demonstrate 83-87% acceleration in producing protocols that meet compliance standards - a critical distinction from merely "completing" protocols, as acceptability to regulatory authorities is the true measure of success. ## GXP Compliance Implementation Achieving GXP compliance for the platform required integrating compliance considerations throughout the development lifecycle rather than treating it as a final validation step. The AWS team emphasized that "compliance is not just a process, it's a lifestyle." **Design Phase**: Compliance begins with requirements capture, which Novartis formalized in Jira (though other tools like documented notes or Amazon Transcribe could work). Critical is linking requirements to implementation rationale and design decisions. Threat modeling starts at the design phase, identifying vulnerabilities before implementation begins. Architectural Decision Records prove invaluable here, documenting why specific approaches were selected and providing audit trails for compliance verification. **Implementation Phase**: Security and audit controls are identified and implemented for every platform component. Infrastructure Qualification (IQ), Operational Qualification (OQ), and Performance Qualification (PQ) tests run simultaneously with platform development rather than as a post-implementation afterthought. **Documentation Phase**: As implementation concludes, comprehensive documentation captures architecture handbooks, operational guides, validation plans, and test results. Critically, when test outcomes don't match intent, the correct response is modifying the implementation, not adjusting tests to match flawed implementations. Looking forward, the team plans to leverage Amazon Bedrock with AWS Lambda to automate portions of compliance documentation - for instance, generating architectural descriptions from diagrams or automating IQ/OQ/PQ test generation. This isn't about replacing skilled personnel but enhancing efficiency and potentially identifying edge cases humans might miss. ## LLMOps and GenAI Use Cases in Production While the presentation focused heavily on data platform infrastructure, several generative AI applications in production or near-production were highlighted: **Protocol Generation**: The most concrete GenAI use case mentioned is clinical protocol drafting, achieving 83-87% acceleration in generating regulatory-acceptable protocols. This represents a sophisticated application requiring deep domain knowledge integration, as clinical protocols are complex, highly regulated documents. The distinction between "completed" and "acceptable" protocols indicates robust evaluation against compliance standards, not just completion metrics. **Document Automation**: Multiple references point to document generation use cases across clinical study reports and medical affairs documents. These are characterized as "low-hanging fruit" that build organizational confidence while delivering tangible value. The architectural principle of treating similar document types (clinical study reports vs. medical affairs documents) with common GenAI mechanisms demonstrates practical reusability patterns. **Future GenAI Integration**: The intermediate roadmap targets connecting the data platform to "all GenAI use cases," indicating protocol generation is among multiple generative AI applications being developed or piloted. The current state involves some applications running on demo data or legacy systems, with the platform migration representing a significant maturation step toward production-grade GenAI operations. ## MLOps and Operational Considerations The case study demonstrates sophisticated thinking about production ML and AI operations beyond model development: **Infrastructure as Code and CI/CD**: All platform components deploy via CI/CD pipelines using infrastructure-as-code principles. Data product code automatically deploys across dev, QA, test, and production environments within domain-specific AWS accounts. This represents mature DevOps practices applied to data and ML infrastructure. **Metadata-Driven Data Products**: Each data product includes a registration form capturing business metadata - essentially documentation enabling discovery and appropriate use. This metadata powers the searchable catalog, enabling practitioners to find relevant datasets and understand their context and constraints. **Observability and Accountability**: The centralized observability platform doesn't just aggregate logs but drives accountability. Users receive visibility into data quality, access patterns, security posture, performance characteristics, and latency. This transparency enables data owners to enhance their data products based on actual usage patterns and user feedback, creating a virtuous cycle of improvement. **Open Standards**: A deliberate architectural decision prioritized open standards and interoperability. In the GXP-compliant environment with requirements for tool interoperability across therapeutic areas and platforms, vendor lock-in poses significant risks. Open data formats, table formats, and interoperable AWS services prevent architectural constraints that would limit future flexibility. **Modular AI Capabilities**: Novartis and AWS created a map of AI capabilities most relevant to pharmaceutical development, grouping similar use cases to accelerate implementation. Understanding how to apply specific capabilities across multiple contexts (document generation for various document types, for example) significantly shortens time-to-value and creates implementation patterns that can be replicated. ## Organizational and Cultural Transformation The presenters repeatedly emphasized that technology represents only 30% of success, with 70% dependent on organizational culture and change management. Several critical success factors emerged: **Stakeholder Education**: Multi-level education proved essential, from C-level executives to technical teams to operational staff. Framing AI initiatives in terms of patient impact - the "$2.6 billion drug problem" and 6-month acceleration goal - created understanding and enthusiasm across the organization. When individuals understand how their work accelerates medicine delivery to patients, adoption accelerates naturally. **Data-Driven Culture**: The quantified outcomes (72% faster queries, 60% cost reduction, 160+ hours saved) weren't merely technical achievements but demonstrations of how trusted, accessible, democratized data drives transformation. Self-service user experiences make data easy to access, generating positive feedback loops where users request additional capabilities, further improving the platform. **Ways of Working**: Integrating AI agents into processes required fundamental process redesign, not superficial additions. This created opportunities to eliminate legacy inefficiencies while building workforce confidence through upskilling initiatives. The excitement of participating in transformation drives natural adoption rather than forced change. **Value Framework and Accountability**: The robust value framework with clear ROI for each use case enabled prioritization between competing initiatives while creating accountability for leaders claiming value delivery. This discipline prevents unfocused proliferation of low-impact projects. **Strategic Partnerships**: Novartis explicitly acknowledged that their scientific and domain expertise, while foundational, required complementary partnership for scale and innovation. The strategic partnership with AWS Professional Services and Accenture brought specialized platform expertise, implementation accelerators, and proven patterns. ## Lessons Learned and Recommendations The presentation concluded with practical lessons learned and recommendations for organizations pursuing similar transformations: **Think Big, Start Small, Scale Fast**: Maintain ambitious vision while identifying practical starting points. For organizations mid-journey, periodically revisit whether approach and direction remain aligned with evolving business needs. **Align to Business Strategy**: Technology modernization must directly support business objectives, not exist for its own sake. The holistic view of how capabilities and tools inject value into business processes drives meaningful outcomes. **Work Backwards from Customers**: Whether internal or external, involving customers early and continuously generates feedback that shapes platform evolution. When solutions address actual customer needs, adoption happens naturally rather than requiring extensive promotion. **Focus on User Experience and Self-Service**: The Amazon.com shopping analogy for data product subscription illustrates the ideal: browse catalog, select product, add to cart, purchase (request access). Simple, intuitive workflows dramatically accelerate adoption compared to complex, multi-step processes requiring extensive training. **Choose the Right Implementation Partner**: Strategic partners bring transformation expertise, change management capabilities, and technical accelerators. AWS Professional Services demonstrated this through AI-ready accelerators enabling 4x faster foundation builds, deployment of data marketplaces in weeks rather than months, and 75% faster time-to-market for ML use cases. **Unlock Trapped Business Value**: The distinction between "realized business value" and "trapped potential business value" is illuminating. Foundation platforms are table stakes, but combining them with laser-focused, industry-specific use cases maximizes value capture. Organizations can start deriving GenAI outcomes from existing data wherever it sits (on-premises, in applications, in cloud storage), taking a hybrid approach that delivers business value while gradually maturing data infrastructure. This avoids year-long platform builds before delivering any business outcomes. **Security and Responsible AI**: Organizations implementing responsible AI practices see 25% increased customer loyalty and satisfaction, with 82% expecting improved employee experience and trust. Trust in data and approach drives usage. Security must be designed into every layer - ingestion, transformation, analytics, governance, consumption - across infrastructure, data, and AI model layers. AWS services supporting this include encryption (KMS, Certificate Manager), network isolation (VPCs, PrivateLink), observability tools, and governance/compliance services (Config, various compliance frameworks). ## Balanced Assessment While the presentation showcases impressive technical achievement and measurable business outcomes, several considerations warrant balanced evaluation: The case study is presented by vendor partners (AWS and Accenture) with commercial interests in promoting their services and technologies. The quantified benefits (72% query speed improvement, 60% cost reduction) come from a single domain implementation, and scaling these results across all domains remains to be validated in production. The 83-87% protocol generation acceleration is compelling but lacks detail on how "acceptability" is measured and validated by regulatory authorities. The complexity of building GXP-compliant data platforms with data mesh architectures, CI/CD automation, comprehensive observability, and integrated governance should not be underestimated. The presentation acknowledges starting as a "technology program" or "infrastructure program" before becoming an AI catalyst, suggesting significant investment in foundational work before AI value materialization. Organizations pursuing similar transformations should anticipate substantial upfront investment and multi-year timelines, despite accelerators. The emphasis on organizational and cultural change (70% of success) reflects real-world challenges often underestimated in AI initiatives. The need for stakeholder education, process redesign, upskilling, and change management represents substantial non-technical investment. The "ruthless prioritization" principle implicitly acknowledges resource constraints requiring difficult tradeoffs between competing initiatives. That said, the architectural principles - modularity, open standards, incremental implementation, security-by-design - represent sound engineering practices applicable across industries. The specific focus on compliance-as-lifestyle rather than compliance-as-checkpoint offers valuable guidance for regulated industries. The hybrid approach to value delivery (building foundations while delivering use case outcomes) addresses the common pitfall of multi-year platform builds with no interim business value. The partnership model between Novartis (domain expertise), AWS (cloud and AI platform), and Accenture (implementation) illustrates the reality that successful AI transformation at scale typically requires ecosystem collaboration rather than purely internal development. The accelerators and patterns AWS developed specifically for pharmaceutical use cases (protocol generation, clinical data management, patient safety) demonstrate vendor investment in industry-specific solutions beyond generic infrastructure. Overall, this case study represents a substantive example of production LLMOps at enterprise scale in a highly regulated industry, with realistic discussion of challenges, quantified outcomes from production deployment, and practical lessons learned applicable to similar transformations. The infrastructure-first approach may frustrate organizations seeking faster GenAI wins, but likely reflects pragmatic reality for enterprises with complex legacy landscapes and strict compliance requirements.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.