Company
Healio
Title
AI-Powered Clinical Decision Support Platform for Healthcare Providers
Industry
Healthcare
Year
2026
Summary (short)
Healio, a medical information platform serving healthcare providers across 20+ specialties for 125 years, developed Healio AI to address the challenge of physicians experiencing information overload while working under extreme time pressure. The solution uses a RAG-based system that combines Healio's proprietary clinical content with trusted sources like PubMed journals to provide physicians with accurate, contextual, and trustworthy answers at point of care. Through extensive user testing with over 300 healthcare professionals, the team discovered physicians primarily used the tool to prepare for patient interactions and improve patient communication rather than just diagnostic queries. The product launched successfully with predominantly positive feedback, featuring HIPAA compliance, citation transparency, and contextual advertising for monetization.
## Overview Healio AI represents a production LLM system deployed by Healio, a healthcare information platform that has served medical professionals for 125 years. The company provides daily news updates, continuing medical education (CME), clinical guidance, and reference information across more than 20 medical specialties. The AI product was developed to address a critical pain point in healthcare: physicians face constant information overload while working under extreme time pressure, needing to stay current with clinical guidelines, research studies, and FDA alerts while making critical patient care decisions. The core insight that drove this product came from extensive user research. The team surveyed over 300 healthcare professionals to understand their attitudes toward AI and identify where they would use such a tool. Approximately 75% reported using AI for patient interactions and patient care, with some physicians even showing AI-generated content directly to patients to help explain medical concepts. This research revealed that trust, transparency, and access to credible sources were paramount concerns for physicians when considering AI tools. ## Discovery and Prototype Development The product development process began with the ChatGPT moment in late 2022/early 2023, which prompted internal discussions at Healio about potential AI applications. Initial skepticism centered on whether ChatGPT itself could suffice, but the team quickly recognized that the hallucination problems and lack of source credibility made general-purpose LLMs unsuitable for medical applications without significant customization. The team employed a parallel prototyping approach that proved highly effective. The UX team created low-fidelity Figma prototypes while the development team, leveraging AI coding tools like Cursor, built a working prototype in a single weekend. This rapid development was possible because the team had already partnered with an outside company earlier in the year to stand up an initial version of their RAG system. The working prototype included actual LLM functionality connected to their content sources, allowing real physicians to enter prompts and receive responses. This early prototype phase yielded several critical insights that fundamentally shaped the product direction. Most significantly, the team's assumptions about use cases were challenged. They expected physicians would primarily ask diagnostic and treatment questions, but beta testers predominantly asked about patient communication. Questions like "How do I explain this diagnosis to my patient?" and "How can I be more empathetic to my patient?" dominated the usage patterns. This discovery led to a major shift in the product's tone, moving from purely clinical accuracy to including more empathetic language in responses. The beta testing also revealed important preferences around information density and source transparency. While physicians appreciated bulleted, summarized responses due to time constraints, they also wanted the ability to deep-dive into source materials. This progressive disclosure approach became a key design principle. Additionally, the initial prototype was notably slow, which early feedback identified as a critical issue requiring optimization before launch. ## Technical Architecture and RAG Implementation The technical architecture centers on a sophisticated RAG system that goes well beyond simple vector search. The team learned through experimentation that neither traditional lexical search, vector search, nor semantic search alone was sufficient for their use case. Instead, they implemented a hybrid approach combining all three search methodologies. This multi-faceted search strategy allows the system to optimize retrieval based on the nature of the query and the type of content being searched. Data sources include Healio's proprietary content including news, CME courses, clinical guidelines, and reference information as well as carefully curated external sources, with PubMed being a primary third-party source. The team deliberately filters for high-quality journals within PubMed, recognizing that not all published research carries equal weight in clinical practice. This curation step happens before content enters the RAG system, ensuring the LLM only works with trustworthy sources rather than trying to make quality judgments on the fly. A particularly interesting technical challenge involves handling different types of medical content with varying requirements for accuracy and fidelity. For clinical guidelines, the system must preserve the exact wording from the original source to maintain clinical accuracy. This requirement means the system cannot allow the LLM to paraphrase freely. The solution involves a process that identifies when guideline content should be returned and injects that text verbatim into the response, ensuring the integrity of critical clinical information. The data ingestion pipeline itself required careful evaluation of multiple approaches. Even for a single source like PubMed, the team explored five different methods of accessing the data: direct API calls, FTP access, alternative APIs, BigQuery, and S3 buckets. Each approach came with different tradeoffs in terms of data quality, speed, and reliability. The team prototyped all these options to determine the optimal path. For some partner sources, the solution involves web crawling, which introduces its own challenges around page structure variability and data extraction brittleness. Intent recognition plays a crucial role in the search optimization. When users ask about "the latest treatments," the system recognizes temporal intent and modifies search parameters to prioritize more recent articles. This type of query understanding and search modification represents an ongoing area of refinement that the team expects will never be complete given the variety of ways physicians frame their questions. ## Production Implementation and User Experience The production interface centers on a chat-based interaction where physicians enter prompts and receive streamed responses. A clever design decision involves displaying contextual advertisements during the processing time while the RAG system and LLM generate the response. This serves dual purposes: it provides a monetization mechanism for the free product while also reducing the perceived wait time for users. The ad selection uses keyword matching against the prompt, ensuring relevance. For example, a query about lung cancer would trigger lung cancer-related advertisements. The citation and source attribution system underwent multiple iterations based on user feedback. Initially, sources were displayed in a separate tab that users could navigate to after reading the response. User testing revealed physicians wanted to see references inline as they read, leading to the implementation of superscript numbers that function as interactive citations. Users can hover over or click these subscripts to see the source information immediately, without context switching. This design choice directly supports the trust-building objective by making source verification as frictionless as possible. The response formatting evolved significantly from the prototype. Initial versions contained dense paragraphs of text that users found overwhelming. The team iterated toward more structured responses using bullets, tables, and clear sectioning to improve digestibility. However, this formatting had to be balanced against the medical accuracy requirement, particularly for guideline content that must be preserved exactly as published. One of the most interesting discoveries from actual usage patterns concerns the point-of-care application. While the team initially imagined physicians using the tool while literally sitting with patients, user research revealed the primary use case is preparation for patient interactions. Physicians use Healio AI to quickly prepare for scheduled appointments—whether that's the day before, during a lunch break before afternoon appointments, or in other preparation moments. This allows them to enter patient interactions fully informed while keeping their attention focused on the patient during the actual appointment. Some physicians also reported using the tool to help craft email responses to patients, where they have more time to consider their communication. ## Trust, Safety, and Compliance HIPAA compliance was a non-negotiable requirement from the start, and the system implements multiple guardrails to ensure patient privacy. On the input side, the system uses masking to detect and remove any personal health information (PHI) before queries are processed. This includes obvious identifiers like patient names and social security numbers, though physicians are unlikely to enter such information given their training. The key principle is that any PHI should be "lost in thin air" and never reach any server or be stored anywhere. Input guardrails also ensure physicians are asking appropriate questions within the scope of the system's intended use. These boundaries help prevent misuse while keeping the tool focused on its core value proposition of providing clinical information support. The trust-building strategy extends beyond technical safeguards to design choices around transparency. Every response includes clear source attribution, and the system prioritizes trusted medical sources that physicians already recognize and respect. The team's philosophy is that trust cannot be achieved simply by claiming accuracy—it must be built through demonstrated reliability and transparency about information provenance. ## Evaluation Strategy and Quality Assurance The evaluation approach reflects a healthy skepticism about relying solely on automated metrics, particularly given the high stakes of medical information accuracy. In the early stages, the team explicitly chose not to use LLM-as-judge approaches, reasoning that using a flawed LLM to evaluate another LLM's medical accuracy made little sense. Instead, they relied heavily on direct physician feedback through their beta testing program and the Healio Innovation Partners group. The beta program involved physicians entering real prompts without guardrails or restrictions, providing authentic usage data. The team collected structured feedback tied to each query-response pair through a rating interface. This feedback loop proved invaluable for understanding not just when responses failed but why they failed and what physicians actually needed. More recently, the team has begun experimenting with LLM-as-judge systems but maintains a cautious, experimental stance. They currently run eight different judges evaluating different dimensions: safety, medical accuracy, faithfulness, relevancy, completeness, reasoning, clarity, and a final holistic judge that synthesizes these assessments. This evaluation runs using various LLM providers including GPT-4o and other frontier models, operating under the theory that the most capable models will provide the best evaluation quality. However, the team treats LLM-as-judge results as directional indicators rather than definitive assessments. When these automated evaluations surface potential issues, the team brings those findings back to their physician advisory panel for validation before making changes. This human-in-the-loop approach ensures that product decisions ultimately serve physician needs rather than optimizing for automated metrics that may not align with real-world utility. The team also experimented with various off-the-shelf evaluation tools while simultaneously building custom evaluation frameworks. This parallel exploration allows them to compare different approaches and understand which evaluation strategies provide the most actionable insights for their specific use case. ## Post-Launch Operations and Continuous Improvement The product has been live for approximately four to six weeks at the time of the interview. Every response includes thumbs up and thumbs down feedback options with pre-selected categories explaining negative feedback, plus an open text field for additional explanation. The small but focused team reviews this feedback weekly, examining negative responses in detail to understand the query, the system's response, and why it fell short of expectations. Interestingly, the team reports receiving more positive than negative feedback, which while encouraging, creates a challenge in gathering critical improvement signals. They explicitly want users who are dissatisfied to share feedback so they can continue refining the product. The continuous improvement process involves regular meetings to prioritize potential changes based on user feedback. This includes both frontend UX adjustments and backend modifications to the RAG system, prompt engineering, or search strategies. The team maintains close engagement with the Healio Innovation Partners group, which provides ongoing access to physicians willing to test new features and provide detailed feedback. The team has also built an advisory board of physicians who have been deeply involved in product development. This group's willingness to invest time in shaping the tool validates that it addresses a meaningful problem in their workflow. The advisory board serves as a sounding board for roadmap decisions and helps validate potential changes before they reach the broader user base. ## Technical Infrastructure and Development Tools The development team leverages modern AI-assisted development tools extensively. The VP of Technology, despite having a managerial title, remains hands-on with development and describes the shift from writing CRUD applications in C, JavaScript, and SQL to serving more as an architect. The workflow involves designing solutions conceptually and then using tools like Cursor to implement them, with the developer's 15 years of experience amplifying what can be accomplished with AI assistance. This development approach enabled the rapid prototyping that proved critical to the product's success. The ability to build a working prototype over a weekend—while the UX team simultaneously worked on design prototypes—exemplifies how AI coding tools are changing the pace of product development, particularly for experienced developers who can effectively guide and validate the AI's output. ## Strategic Considerations and Product Philosophy Several strategic themes emerge from this case study. First, the team's commitment to responsible AI adoption centers on ensuring physicians feel comfortable using the tool and can maintain trust with their patients. This principle guided decisions from source curation to citation display to tone adjustment. Second, the team recognizes that designing for AI doesn't reduce the need for discovery—if anything, it increases it. The high stakes of healthcare applications demand even deeper user understanding. The surprising findings about patient communication use cases and the iterative refinements to citation display demonstrate how continuous user engagement shapes product success. Third, the team takes a pragmatic view of the rapidly evolving AI landscape. They acknowledge experimenting with many different approaches to RAG, evaluation, and search while recognizing that best practices are still emerging. Their willingness to try various solutions and measure results reflects an appropriate stance given the nascent state of LLMOps practices. The monetization strategy through contextual advertising represents a pragmatic approach to sustainability. By making the tool free for physicians while generating revenue through relevant ads, Healio balances accessibility with business viability. The decision to show ads during processing time demonstrates thoughtful UX design that turns a potential negative (wait time) into a dual-purpose experience. ## Evolution of Technical Approach The team's thinking about speed versus accuracy has evolved alongside the broader industry. Initially, the focus was on minimizing response time, which created tension with the thoroughness required for trustworthy medical information. However, as products like Google's Deep Research and various reasoning models emerged showing users will accept longer processing times for better results, the team began envisioning a dual-mode system: a fast mode for quick point-of-care needs and a deep research mode for more complex clinical questions where users can tolerate longer processing in exchange for comprehensive, well-reasoned answers with detailed explanations of the LLM's reasoning process. This evolution reflects a broader maturation in LLMOps thinking—moving from a singular focus on latency optimization to recognizing that different use cases warrant different tradeoffs between speed and thoroughness. The team's willingness to adapt their thinking as the field evolves demonstrates the kind of flexibility required for successful LLM product development in this rapidly changing landscape.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.