Healthcare
Roche Diagnostics / John Snow Labs
Company
Roche Diagnostics / John Snow Labs
Title
Building Healthcare-Specific LLM Pipelines for Oncology Patient Timelines
Industry
Healthcare
Year
Summary (short)
Roche Diagnostics developed an AI-assisted data abstraction solution using healthcare-specific LLMs to extract and structure oncology patient timelines from unstructured clinical notes. The system leverages natural language processing and machine learning to automatically detect medical concepts, focusing particularly on chemotherapy treatment timelines. The solution addresses the challenge of processing diverse, unstructured healthcare data formats while maintaining high accuracy through domain-specific LLMs and carefully engineered prompts.
## Overview This case study presents a collaboration between Roche Diagnostics Information Solutions and John Snow Labs (a customer relationship since August 2018) focused on applying healthcare-specific large language models to build oncology patient timelines and recommend clinical guidelines. The presentation was delivered by Vishaka Sharma, Senior Principal Data Scientist at Roche Diagnostics, who leads Advanced Analytics initiatives including natural language processing and machine learning to improve Roche's navify digital product portfolio. Roche is a 126-year-old company headquartered in Basel, Switzerland, with two main business divisions: Diagnostics and Pharmaceuticals. The company positions itself as the number one in vitro diagnostics and pathology provider, with 27 billion diagnostic tests performed annually. Their navify digital solutions platform, which unifies all digital solutions from Roche Diagnostics under one brand, aims to securely integrate data across care settings, deliver operational and medical insights, and accelerate access to innovation. ## The Problem: Unstructured Healthcare Data The core challenge addressed in this case study is that most healthcare data is unstructured and not readily available for quick consumption at the point of care. This unstructured healthcare data exists in diverse report formats, spans multiple languages across worldwide geographies, and suffers from semantic ambiguity where healthcare data concepts can be unclear or open to multiple interpretations. Examples include synonyms, abbreviations, and other linguistic variations that create challenges when analyzing, exchanging, or using healthcare data for clinical care, research, and healthcare management. Manual annotations of medical data are labor-intensive and subjective, making it difficult to scale the extraction of relevant information from unstructured sources. Cancer care data and decision-making are particularly complex. The presentation illustrated this with a mock patient timeline spanning from 2014 to 2020, showing a breast cancer diagnosis (pT3 N1 M0 stage 2a infiltrating ductal carcinoma) followed by surgery, first relapse with liver metastasis, and second relapse with bone metastasis and liver mass regrowth. ## The navify Oncology Clinical Hub Solution Roche's navify Oncology Clinical Hub is designed as a clinical decision support solution to simplify and improve day-to-day clinical decision-making and care coordination across the oncology care continuum. It aggregates and organizes fragmented data, providing an intuitive longitudinal view of the patient's entire care journey accessible to all members of the oncology care team. The solution integrates with NCCN (National Comprehensive Cancer Network) clinical practice guidelines, which are evidence-based, consensus-driven recommendations for cancer management covering diagnosis, treatment, and supportive care. A "smart navigation" capability automatically opens the most relevant section of guidelines based on available patient data, saving clinicians time and supporting evidence-based care decisions. ## Technical Approach: LLM Pipeline for Timeline Extraction The team developed an AI-assisted data abstraction solution leveraging NLP and machine learning to auto-detect medical concepts and translate free text into structured, enriched data. The specific focus was on the 2024 SemEval shared task on chemotherapy treatment timeline extraction, which aimed to advance clinical event timeline extraction from Electronic Health Records (EHRs), specifically targeting chemotherapy event timelines from patients with breast, ovarian, and skin cancers. The task involved two subtasks: Subtask 1 provided gold chemotherapy event mentions and time expressions alongside EHR notes, with the goal of building patient-level timelines using these annotations. Subtask 2 represented a realistic real-world setting where only EHR notes were provided, requiring an end-to-end system for chemotherapy treatment timeline extraction. The EHR data included all types of available notes regardless of cancer relevance, including radiology, pathology, clinical oncology, discharge summaries, and progress notes. The timeline construction process was decomposed into several stages: chemotherapy event extraction, time expression extraction, temporal relation classification, time expression normalization, and patient-level timeline refinement. Temporal relations between events and time expressions were defined using a set of relations including "before," "contains," "contained by," "overlap," "noted on," "begins on," and "ends on." ## Prompt Engineering and Model Selection The team developed structured prompts to guide the system in identifying and extracting relations between pairs of entities. These prompts served as templates for the system to recognize and analyze relevant information in text. Two prompt variants were tested: Prompt 1 for relation labeling from pairs, and Prompt 2 for relation labeling from a separate list of drugs. Prompt 1 was used for the final submission, as Prompt 2 encountered challenges in labeling relationships from distinct lists. A zero-shot prompting approach achieved reasonably high precision by leveraging prompt templates that guided the LLMs to generate responses closely matching desired outputs without requiring explicit training data for each class or category. This is a notable finding for production systems where labeled training data may be scarce or expensive to obtain. The team tested various large language models including Llama 2, Mistral 7B with different sizes and architectures to determine which model works best for relationship extraction. A key finding emphasized in the presentation is that healthcare-specific large language models are significantly more accurate when applied to healthcare domain tasks compared to non-healthcare-specific models. The presentation specifically highlighted JSL Med Llama 3 8B version 1.0 as being "production ready" and easy to adapt for healthcare-specific unstructured data processing. ## Production Considerations and Challenges The presentation offered candid insights into the challenges of deploying LLMs at scale in healthcare settings. Several operational and ethical concerns were raised: **Computational and Infrastructure Costs**: LLMs are expensive at scale due to computational resource requirements. Pre-processing can be expensive, requires specialized talent, and involves significant infrastructure costs. **Data Alignment Issues**: Inputs may not always align with the LLM's intended purpose, requiring careful prompt engineering and validation. **Privacy and Legal Concerns**: LLMs trained with personal data risk privacy and legal issues in their outputs, a critical consideration in healthcare where patient data protection is paramount. **Algorithmic Bias**: If trained on biased data, LLMs may perpetuate or amplify those biases in their outputs. **Hallucination Risks**: LLMs may hallucinate, generating incorrect information that could mislead diagnostic recommendations or produce false treatment suggestions. This is particularly dangerous in clinical decision support contexts. ## Current State and Future Direction The presentation acknowledged that most current generative AI solutions are primarily user interfaces for input-output queries and user interactions. Chatbots and virtual assistants powered by LLMs can improve patient engagement and education, but the team emphasized that applying LLMs responsibly in healthcare requires multi-disciplinary collaboration. The presenter stressed that many professionals must bring their diverse expertise together to ensure LLMs are developed and used effectively, ethically, and responsibly: medical professionals, clinicians, healthcare providers, lawyers, regulatory experts, computer scientists, and patients. This creates both tremendous opportunities and tremendous responsibilities. The use of large language models is described as an ongoing journey for the organization, suggesting that while progress has been made, full production deployment and integration continues to evolve. ## Balanced Assessment While the presentation demonstrates promising applications of healthcare-specific LLMs for oncology timeline extraction, several points warrant balanced consideration. The presentation originated from a partnership presentation between Roche and John Snow Labs, so there is an inherent promotional aspect to the content. Specific quantitative results beyond "reasonably high precision" from zero-shot prompting were not provided, making it difficult to assess the true production readiness of the system. The challenges mentioned regarding hallucination, bias, and computational costs are significant and real obstacles that the presentation acknowledges but does not claim to have fully solved. The emphasis on multi-disciplinary collaboration and the acknowledgment that LLM adoption is "a journey" suggests the solution is still maturing rather than being a fully deployed production system.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.