Company
Huron
Title
Advancing Patient Experience and Business Operations Analytics with Generative AI in Healthcare
Industry
Healthcare
Year
2025
Summary (short)
Huron Consulting Group implemented generative AI solutions to transform healthcare analytics across patient experience and business operations. The consulting firm faced challenges with analyzing unstructured data from patient rounding sessions and revenue cycle management notes, which previously required manual review and resulted in delayed interventions due to the 3-4 month lag in traditional HCAHPS survey feedback. Using AWS services including Amazon Bedrock with the Nova LLM model, Redshift, and S3, Huron built sentiment analysis capabilities that automatically process survey responses, staff interactions, and financial operation notes. The solution achieved 90% accuracy in sentiment classification (up from 75% initially) and now processes over 10,000 notes per week automatically, enabling real-time identification of patient dissatisfaction, revenue opportunities, and staff coaching needs that directly impact hospital funding and operational efficiency.
## Overview and Business Context Huron Consulting Group, a global consulting firm with significant presence in healthcare and education industries, has developed generative AI capabilities to transform how they deliver patient experience analytics and business operations consulting to healthcare clients. The case study is presented by Shane O'Connor and Kendra, representing different sides of Huron's healthcare consulting practice, who discovered commonalities in their AWS architectures and how generative AI could extend their service delivery. The company operates in two primary areas within healthcare consulting: patient experience improvement through their "Huron Rounding" tool, and business operations optimization focused on revenue cycle, supply chain, pharmacy, and workforce management. Data analytics sits at the center of Huron's approach to guiding clients through transformational change management. The adoption of generative AI represents a strategic shift from purely structured data analysis to incorporating unstructured data sources, enabling deeper insights into the "real story" behind client opportunities. ## Problem Statement and Motivation The healthcare consulting challenges Huron addresses are multifaceted. For patient experience, traditional HCAHPS (Hospital Consumer Assessment of Healthcare Providers and Systems) surveys—which directly impact Medicare funding—have a critical limitation: they take 3-4 months to receive results after a patient is discharged. This significant lag means hospitals miss opportunities for early intervention when patients have negative experiences. Higher HCAHPS scores correlate directly with increased hospital funding under the Medicare inpatient payment system, making timely intervention financially critical. On the business operations side, Huron works with vast amounts of unstructured data from revenue cycle management, including free-text notes documenting claim histories, staff actions, denials, and payer interactions. Previously, extracting actionable insights from this unstructured data required manual review by consultants—a time-consuming process that limited scale and delayed strategic recommendations. The challenge was to develop capabilities to analyze this unstructured text data at scale while maintaining the quality and accuracy needed for healthcare decision-making. ## Technical Architecture and Implementation Huron's LLMOps architecture centers on AWS services and follows a similar pattern across both patient experience and business operations use cases. The core architecture leverages Amazon Bedrock with the Nova LLM model, Amazon Redshift for data warehousing and processing, S3 for raw data storage, AWS Glue Catalog for data cataloging, and Amazon QuickSight for visualization and business intelligence. In the patient experience rounding platform, the current implementation processes manually entered questions and answers through sentiment analysis. The workflow operates through an EventBridge scheduler that triggers a step function daily, which invokes a Redshift stored procedure. Crucially, Huron registers the Nova Micro model within Redshift itself, making it callable as a function directly from database operations. This architecture decision enables efficient integration of LLM capabilities into existing data processing workflows without requiring extensive infrastructure changes. The prompting approach passes both question and answer text to the Nova model with instructions to classify responses into one of four sentiment categories. This approach provides structured outputs that can be stored in database tables and used for downstream analytics. The presenters note they use AWS Secrets Manager to securely manage database credentials rather than hardcoding them. For business operations, the architecture follows a similar flow but handles more diverse data sources. Raw data arrives from clients as flat files or from various other data sources and is stored in S3. Data is cataloged in AWS Glue Catalog and transformed in Redshift using DBT (data build tool) for data modeling and transformation. This data curation process assembles disparate data sources into a holistic view—for example, tracking the complete journey of a medical claim from initial billing through final resolution and all intermediary activities. The LLM integration occurs at the processing layer, where Bedrock calls are made to summarize and extract insights from unstructured text fields. These processed results can be written back to Redshift tables for visualization in QuickSight or connected to low-code applications for more ad-hoc analysis and on-demand summarization. This flexibility allows consultants to access AI-generated insights through their preferred interfaces. ## Prompt Engineering and Model Behavior While the presentation doesn't dive deeply into specific prompt engineering techniques, several important aspects of their approach emerge. The prompts for sentiment analysis include both the question asked during patient rounding and the answer provided, giving the model full context for classification. The instruction set specifies returning one of four distinct sentiments, providing structured categorical outputs rather than open-ended responses. For the business operations use case, prompts appear to be designed to identify patterns, extract key themes, and generate summaries from collections of notes. The presenters mention using LLMs to understand "free text notes and unstructured pieces of data" to provide "additional insight" that would have required manual review previously. This suggests prompts designed for information extraction, pattern recognition, and summarization tasks. An interesting future capability mentioned is the sophistication to allow healthcare workers conducting rounds to "ask questions in their own words and in any sequence" rather than following a rigid script. This implies more complex prompt engineering that can map flexible natural language questions to standardized data collection categories—a challenging NLP task that requires robust prompt design and potentially few-shot learning or fine-tuning approaches. ## Model Selection and Performance Huron selected Amazon Bedrock with the Nova Micro model as their primary LLM infrastructure. The Nova Micro model appears well-suited to their use case, likely chosen for its balance of performance, cost-effectiveness, and ease of integration within the AWS ecosystem. The ability to register the model as a callable function within Redshift is a key architectural advantage that reduces latency and simplifies the data processing pipeline. The presenters report achieving 90% accuracy in sentiment analysis, having improved from an initial 75% accuracy. This improvement trajectory suggests they've engaged in iterative prompt refinement, potentially adjusted classification thresholds, and incorporated feedback loops. However, they acknowledge that in healthcare contexts, "we need to get to as close to 100% as possible," recognizing the high stakes of clinical and financial decision-making. This represents a mature understanding of the accuracy-requirement tradeoffs in healthcare AI applications. The mention of the model continuing to "evolve, adapt, and the machine learns" is somewhat imprecise—it's unclear whether they're implementing active learning, fine-tuning, or simply iterating on prompts and classification logic. This ambiguity is common in presentations that may conflate ongoing prompt optimization with model retraining. ## Production Deployment and Operationalization The production deployment demonstrates several mature LLMOps practices. The daily scheduled execution via EventBridge shows automated, recurring inference workflows rather than one-off analyses. Processing over 10,000 notes per week at 90% accuracy indicates genuine production scale rather than a proof-of-concept. The architecture's integration with existing data infrastructure—embedding LLM calls within Redshift stored procedures and DBT transformation pipelines—reflects thoughtful operationalization that fits AI capabilities into established workflows. This approach likely reduced adoption friction and leveraged existing data quality and governance processes. The QuickSight integration for visualization means that LLM-generated insights flow into the same dashboards and reporting tools that consultants and clients already use. This reduces the need for new training and ensures AI-enhanced analytics fit seamlessly into decision-making processes. ## Future Roadmap and Advanced Capabilities Huron's roadmap reveals ambitious plans to expand their LLMOps capabilities significantly. A major initiative is implementing real-time transcription and processing of patient rounding sessions. Rather than manually entering questions and answers into forms, healthcare workers will be able to activate recording, set down their device, and have the entire conversation transcribed and processed automatically. This addresses a significant user experience issue—nurses and staff can engage more personally with patients rather than being "face in a device" taking notes. The transcription implementation will support three modes to handle varying infrastructure conditions: real-time streaming for areas with good connectivity where rounds are documented instantly; batch processing for areas with low internet connectivity that captures details and processes them later; and voice memo upload as a fallback option. This multi-modal approach shows sophisticated thinking about real-world deployment constraints in healthcare facilities. The AI system will extract structured information from free-form conversations, including identifying issues, recognitions, and notes automatically. It will provide both per-question sentiment (as it does today) and overall round sentiment to give rounding staff a holistic view of the patient's emotional state. Critically, the system will provide explanations for why the AI chose specific classifications—an important transparency feature for building trust with healthcare workers who may be unfamiliar with AI systems. Users will maintain "full autonomy" to override AI-generated classifications, and all such corrections will be tracked. This human-in-the-loop approach is essential for healthcare applications and creates opportunities for active learning if they choose to use correction data for model improvement. On the business operations side, future directions include creating staff effectiveness scores and quality ratings based on AI analysis of actions documented in revenue cycle notes. These scores will help identify coaching opportunities and staff performance patterns. The system aims to identify denial patterns, payer relationship issues, and revenue opportunities by combining structured metrics with LLM-powered analysis of unstructured text. A particularly interesting strategic direction is connecting patient experience data with business operations data. Currently, these systems operate on "similar architectures but not totally connected in some of the same AWS accounts." As they integrate these data sources, Huron anticipates finding "hidden interconnections" between patient sentiment and market share, and between patient experience and net revenue opportunities. This represents a sophisticated understanding of how combining diverse data streams with AI can generate emergent insights beyond what either data source provides independently. ## Impact and Results The quantitative results presented are notable. The improvement from 75% to 90% sentiment classification accuracy represents significant progress, achieved through iterative refinement. Processing 10,000 notes automatically per week demonstrates genuine operational scale and represents a substantial efficiency gain compared to manual review. The patient experience improvements aim to enable "smart rounds" where real-time sentiment analysis identifies patients at risk of poor experiences, allowing targeted service recovery interventions. By detecting dissatisfaction much earlier than the 3-4 month HCAHPS survey lag, hospitals can take corrective action that may improve scores and thereby increase Medicare funding. This creates a direct line from AI implementation to financial impact. For business operations, the scale gains allow consultants to provide deeper analysis across broader client bases. The identification of revenue opportunities, denial trends, and staff effectiveness issues at scale means consultants can move more quickly from data gathering to strategic coaching and change management—which the presenters emphasize is core to Huron's value proposition. ## Critical Assessment and Balanced Perspective While the presentation demonstrates genuine LLMOps implementation with real production deployment, several aspects warrant balanced assessment. The reported 90% accuracy is respectable but the presenters themselves acknowledge it falls short of healthcare requirements, which typically demand higher confidence levels for clinical and financial decisions. The gap between current performance and their stated need to approach 100% accuracy represents a significant ongoing challenge. The discussion of how the model "learns" and "adapts" lacks technical precision. It's unclear whether they're implementing active learning, fine-tuning, or simply iterating on prompts. The path from 90% to near-100% accuracy may require approaches beyond prompt engineering, such as fine-tuning on domain-specific healthcare data or implementing ensemble methods, which aren't discussed. The future transcription capabilities are ambitious but face substantial challenges. Speech-to-text in clinical environments with medical terminology, background noise, and multiple speakers is notoriously difficult. The claim that staff can "ask questions in their own words and in any sequence" while the system correctly maps these to standardized data fields requires sophisticated natural language understanding that may be difficult to achieve reliably across diverse communication styles and dialects. The presentation also doesn't address several important LLMOps concerns. There's no discussion of monitoring for model drift, handling edge cases, or managing false positives/negatives in production. The feedback loop for continuous improvement isn't clearly specified—while they mention tracking user corrections, whether and how this data feeds back into model improvement isn't explained. Data privacy and HIPAA compliance, critical in healthcare, receive no explicit mention. While AWS services like Bedrock offer HIPAA-eligible configurations, the presentation doesn't address the governance, audit trails, and compliance processes necessary for handling protected health information. The integration strategy of embedding LLM calls within Redshift is clever for operational simplicity but may create scalability bottlenecks as usage grows. Database-coupled inference can introduce latency and resource contention issues at scale. The architecture may need evolution toward more distributed, microservices-based inference as demand increases. ## Organizational and Change Management Dimensions An often-overlooked strength in this case study is the emphasis on organizational adoption and change management. The presenters repeatedly stress that delivering AI insights through dashboards is "only halfway down the journey" with clients. Huron's model pairs AI-generated insights with consulting expertise to help healthcare organizations actually act on opportunities through coaching and strategy development. This philosophy is reflected in design choices like providing explanations for AI classifications to educate healthcare workers, and maintaining human override capabilities to preserve user autonomy and trust. These choices may reduce pure automation efficiency but likely increase adoption and appropriate usage. The multi-modal transcription approach (streaming, batch, voice memo) similarly shows sensitivity to real-world deployment challenges and user needs. This pragmatic, adoption-focused approach contrasts with purely technology-centric AI implementations and may explain their success in achieving production deployment. ## Architectural Patterns and Best Practices Several architectural patterns emerge as potential best practices. The tight integration with existing data infrastructure—using Redshift stored procedures, DBT for transformations, existing BI tools—reduces friction and leverages established data quality processes. Registering the LLM model as a callable function within the data warehouse is an elegant solution for batch processing scenarios. The separation of raw data storage (S3), cataloging (Glue), transformation (Redshift/DBT), and inference (Bedrock) follows sound data engineering principles with clear separation of concerns. The ability to serve results through both traditional BI tools (QuickSight) and low-code applications provides flexibility for different user personas and use cases. The EventBridge scheduling for automated daily processing demonstrates operationalization beyond ad-hoc analysis. This scheduled inference pattern is appropriate for use cases where real-time processing isn't required and allows for more efficient batch processing. ## Conclusion Huron's generative AI implementation represents a genuine LLMOps deployment in a production healthcare consulting environment. The system processes substantial volumes of unstructured healthcare data, achieves respectable accuracy levels, and delivers insights through established business intelligence channels. The improvement from 75% to 90% accuracy demonstrates iterative refinement, and the processing of 10,000+ notes weekly indicates real operational scale. The roadmap toward real-time transcription and cross-functional data integration shows sophisticated strategic thinking about how AI capabilities can expand consulting service offerings. The emphasis on explainability, user autonomy, and integration with human coaching reflects mature thinking about AI adoption in healthcare contexts. However, the gap between current 90% accuracy and healthcare requirements, the lack of detail on continuous improvement mechanisms, and the absence of discussion around compliance and monitoring suggest areas where the implementation may still be maturing. The ambitious transcription capabilities face substantial technical challenges that may prove more difficult than presented. Overall, this represents a solid LLMOps case study showing how a consulting firm has successfully moved beyond proof-of-concept to production deployment of LLMs for unstructured data analysis at scale, while maintaining realistic awareness of accuracy requirements and organizational adoption challenges in healthcare settings.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.