Clario: Streamlining Clinical Trial Documentation Generation with RAG and LLMs

LLMOps Database

Healthcare

Clario

Company

Clario

Title

Streamlining Clinical Trial Documentation Generation with RAG and LLMs

Industry

Healthcare

Link

https://aws.amazon.com/blogs/machine-learning/clario-enhances-the-quality-of-the-clinical-trial-documentation-process-with-amazon-bedrock?tag=soumet-20

Year

2025

Summary (short)

Clario, a clinical trials endpoint data solutions provider, transformed their time-consuming manual documentation process by implementing a generative AI solution using Amazon Bedrock. The system automates the generation of business requirement specifications from medical imaging charter documents using RAG architecture with Amazon OpenSearch for vector storage and Claude 3.7 Sonnet for text generation. The solution improved accuracy, reduced manual errors, and significantly streamlined their documentation workflow while maintaining security and compliance requirements.

Tags

healthcare

document_processing

regulatory_compliance

Clario's case study demonstrates a sophisticated implementation of LLMs in a highly regulated healthcare environment, specifically focusing on streamlining clinical trial documentation processes. This case study is particularly interesting as it showcases how generative AI can be deployed in situations where accuracy and compliance are critical requirements. ### Company and Use Case Overview Clario is a well-established provider of endpoint data solutions for clinical trials, with over 50 years of experience and involvement in more than 26,000 clinical trials. Their primary challenge involved the time-consuming process of generating various documents for clinical trials, particularly the creation of business requirement specifications (BRS) from medical imaging charter documents. This process traditionally took weeks and was prone to manual errors and inconsistencies. ### Technical Architecture and Implementation The solution architecture demonstrates a well-thought-out approach to implementing LLMs in a production environment: * **Document Processing and Storage**: The system begins with on-premises document processing, with files being securely transferred to AWS via Direct Connect. This shows careful consideration for data security and existing infrastructure integration. * **Vector Database Implementation**: The solution uses Amazon OpenSearch Serverless as a vector store, with document chunks being embedded using Amazon Titan Text Embeddings model. This approach to RAG (Retrieval Augmented Generation) allows for semantic search and relevant context retrieval, which is crucial for accurate document generation. * **LLM Selection and Integration**: The system employs Claude 3.7 Sonnet through Amazon Bedrock for both question-answering and conversational AI applications. The choice of using Amazon Bedrock is strategic as it provides a single API for accessing various foundation models while maintaining security and privacy controls. * **Workflow Orchestration**: A custom workflow engine manages the document generation process, handling the coordination between various components: * Maintains a global specification for prompts * Queries the vector database for relevant charter information * Orchestrates the LLM calls for each business requirement * Manages the output generation and review process ### Production Considerations and Best Practices The implementation showcases several important LLMOps best practices: * **Security and Compliance**: The solution maintains data security by keeping all information within the AWS ecosystem and utilizing Amazon Bedrock's security features. This is crucial for handling sensitive clinical trial data. * **Scalability**: The use of serverless architecture ensures the system can handle varying workloads without manual intervention. * **Human-in-the-Loop**: The system includes a review step where business requirement writers can review and finalize the generated documents, ensuring quality control and maintaining human oversight. * **Integration Capabilities**: The architecture allows for future expansion and integration with other document workflows, demonstrating forward-thinking system design. ### Technical Challenges and Solutions The implementation addressed several key technical challenges: * **Document Chunking**: The system needed to effectively break down source documents into manageable pieces while maintaining context and relevance. * **Prompt Engineering**: The workflow engine maintains a global specification for prompts, suggesting careful attention to prompt design and management. * **Vector Search**: The implementation of semantic search through vector embeddings helps ensure relevant information retrieval. ### Results and Impact The solution delivered several significant improvements: * **Accuracy Enhancement**: The use of generative AI reduced translation errors and inconsistencies, leading to fewer reworks and study delays. * **Process Efficiency**: The automated system significantly reduced the time required for document generation compared to the manual process. * **Scalable Architecture**: The serverless design allows for easy scaling as demand increases. ### Lessons Learned and Future Directions The case study reveals several important insights: * **Domain Specificity**: The success of the implementation demonstrates the value of applying generative AI to specific domain problems in life sciences. * **Stakeholder Involvement**: Early involvement of business stakeholders proved crucial for success. * **Iterative Development**: The project started as a prototype and is being moved to production in 2025, showing a methodical approach to implementing AI solutions. ### Technical Debt and Considerations While the case study presents a successful implementation, several areas require ongoing attention: * **Model Updates**: The system needs to account for updates to the underlying LLMs and embedding models. * **Prompt Management**: As the system scales, maintaining and updating prompts will require systematic management. * **Quality Assurance**: Continuous monitoring of output quality and accuracy is essential, especially in a clinical trial context. This case study represents a sophisticated example of LLMOps implementation in a regulated industry, demonstrating how careful architecture design, security considerations, and human oversight can create a successful production system using generative AI.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source