ZenML

AI-Powered Fax Processing Automation for Healthcare Referrals

Providence 2025
View original source

Providence Health System automated the processing of over 40 million annual faxes using GenAI and MLflow on Databricks to transform manual referral workflows into real-time automated triage. The system combines OCR with GPT-4.0 models to extract referral data from diverse document formats and integrates seamlessly with Epic EHR systems, eliminating months-long backlogs and freeing clinical staff to focus on patient care across 1,000+ clinics.

Industry

Healthcare

Technologies

Providence Health System represents one of the largest nonprofit health systems in the United States, serving vulnerable communities through 51 hospitals, over 1,000 outpatient clinics, and more than 130,000 caregivers across seven states. Their case study demonstrates a sophisticated application of LLMOps principles to solve a critical healthcare workflow challenge: the automated processing of massive volumes of referral faxes that previously required manual intervention and created significant care delays.

The core problem facing Providence was the overwhelming volume of healthcare communications that still rely heavily on fax technology despite digital transformation efforts. The organization processes more than 40 million faxes annually, totaling over 160 million pages, with a significant portion requiring manual review and transcription into their Epic electronic health record (EHR) system. This manual process created multi-month backlogs, delayed patient care, and consumed valuable clinical staff time that could be better spent on direct patient care activities.

From an LLMOps perspective, Providence’s approach exemplifies several key principles of production AI deployment. Their technical architecture centers on the Databricks Data Intelligence Platform, specifically leveraging MLflow for experiment management and model lifecycle operations. This choice reflects a mature understanding of the need for systematic experimentation when dealing with the inherent variability and complexity of unstructured healthcare documents.

The experimentation framework built around MLflow addresses several critical LLMOps challenges. Providence uses parameterized jobs to systematically sweep across combinations of OCR models, prompt templates, and other hyperparameters. This approach allows them to manage the complexity of optimizing multiple components simultaneously - from optical character recognition tools to language model prompts. The parameterized job framework enables dynamic input configuration at runtime, making their experimental pipeline both flexible and reusable through CI/CD integration that produces YAML configuration files for large-scale testing.

Central to their LLMOps strategy is comprehensive experiment tracking and logging through MLflow. This provides the team with clear visibility into model performance across different document types and referral scenarios, enabling efficient comparison of results without duplicating experimental effort. The centralized logging capability supports deeper evaluation of model behavior, which is particularly crucial given the diversity of referral forms and the strict compliance requirements within heavily regulated EHR environments like Epic.

Providence’s use of historical data for simulation represents another sophisticated LLMOps practice. By leveraging existing fax data to simulate downstream outcomes, they can refine their models before production deployment, significantly reducing risk and accelerating the deployment cycle. This is particularly important in healthcare settings where errors can have significant patient care implications and where integration with established systems like Epic requires rigorous validation.

The technical stack demonstrates a thoughtful integration of multiple AI technologies within a production environment. While Azure AI Document Intelligence handles OCR processing and OpenAI’s GPT-4.0 models perform information extraction, the real engineering value comes from the MLflow-orchestrated pipeline that automates what would otherwise be manual and fragmented development processes. This unified approach through the Databricks platform enables the transformation of raw fax documents through experimentation with different AI techniques and validation of outputs with both speed and confidence.

The integration requirements with Epic EHR systems add another layer of complexity to the LLMOps implementation. All extracted referral data must be seamlessly formatted, validated, and securely delivered to the existing healthcare infrastructure. Databricks plays a critical role in pre-processing and normalizing this information before handoff to the EHR system, requiring careful attention to data quality and format consistency.

Providence’s broader technical infrastructure includes Azure Kubernetes Service (AKS) for containerized deployment, Azure Search to support retrieval-augmented generation (RAG) workflows, and Postgres for structured storage. This multi-service architecture requires sophisticated orchestration and monitoring capabilities to ensure reliable operation at the scale of 40 million documents annually. The team is also actively exploring Mosaic AI for RAG and Model Serving to enhance accuracy, scalability, and responsiveness of their AI solutions, indicating continued evolution of their LLMOps practices.

The production deployment strategy addresses several key LLMOps considerations around scalability and reliability. Moving from manual processing to real-time automation of 40 million annual faxes requires robust infrastructure capable of handling peak loads and maintaining consistent performance. The shift from months-long backlogs to real-time processing represents a significant operational transformation that required careful attention to system reliability and error handling.

One of the most interesting aspects of Providence’s LLMOps implementation is their approach to handling the inherent variability in healthcare workflows. The lack of standardization between clinics, roles, and individuals creates significant challenges for defining universal automation pipelines or creating test scenarios that reflect real-world complexity. Their experimentation framework addresses this by enabling rapid iteration across different configurations and validation against diverse document types and workflow patterns.

The diversity of input documents - from handwritten notes to typed PDFs - creates a wide range of processing challenges that require sophisticated prompt engineering and model tuning. Providence’s systematic approach to hyperparameter optimization through MLflow enables them to handle this complexity more effectively than ad-hoc manual tuning approaches would allow.

From a business impact perspective, the LLMOps implementation has delivered significant operational improvements. The elimination of two to three-month backlogs in some regions directly impacts patient care timelines, while the automation of repetitive document processing frees clinical staff to focus on higher-value activities. The system-wide efficiency gains scale across Providence’s 1,000+ outpatient clinics, supporting their mission to provide timely, coordinated care at scale.

The case study also highlights important considerations around change management and workflow transformation in healthcare settings. The transition from manual to automated processing requires careful consideration of existing staff workflows and training needs, as well as integration with established clinical practices and compliance requirements.

Providence’s approach demonstrates mature LLMOps practices including systematic experimentation, comprehensive monitoring and logging, automated testing and validation, and seamless integration with existing enterprise systems. Their use of MLflow for experiment management and model lifecycle operations provides a solid foundation for continued iteration and improvement of their AI-powered automation systems. The case represents a successful example of applying LLMOps principles to solve real-world healthcare challenges at enterprise scale, with measurable impacts on operational efficiency and patient care delivery.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Production AI Deployment: Lessons from Real-World Agentic AI Systems

Databricks / Various 2026

This case study presents lessons learned from deploying generative AI applications in production, with a specific focus on Flo Health's implementation of a women's health chatbot on the Databricks platform. The presentation addresses common failure points in GenAI projects including poor constraint definition, over-reliance on LLM autonomy, and insufficient engineering discipline. The solution emphasizes deterministic system architecture over autonomous agents, comprehensive observability and tracing, rigorous evaluation frameworks using LLM judges, and proper DevOps practices. Results demonstrate that successful production deployments require treating agentic AI as modular system architectures following established software engineering principles rather than monolithic applications, with particular emphasis on cost tracking, quality monitoring, and end-to-end deployment pipelines.

healthcare chatbot question_answering +42

Deploying Secure AI Agents in Highly Regulated Financial and Gaming Environments

Sicoob / Holland Casino 2025

Two organizations operating in highly regulated industries—Sicoob, a Brazilian cooperative financial institution, and Holland Casino, a government-mandated Dutch gaming operator—share their approaches to deploying generative AI workloads while maintaining strict compliance requirements. Sicoob built a scalable infrastructure using Amazon EKS with GPU instances, leveraging open-source tools like Karpenter, KEDA, vLLM, and Open WebUI to run multiple open-source LLMs (Llama, Mistral, DeepSeek, Granite) for code generation, robotic process automation, investment advisory, and document interaction use cases, achieving cost efficiency through spot instances and auto-scaling. Holland Casino took a different path, using Anthropic's Claude models via Amazon Bedrock and developing lightweight AI agents using the Strands framework, later deploying them through Bedrock Agent Core to provide management stakeholders with self-service access to cost, security, and operational insights. Both organizations emphasized the importance of security, governance, compliance frameworks (including ISO 42001 for AI), and responsible AI practices while demonstrating that regulatory requirements need not inhibit AI adoption when proper architectural patterns and AWS services are employed.

healthcare fraud_detection customer_support +50