## Overview
HeyRevia is building AI-powered call center agents specifically designed for the healthcare industry. The presentation was given by Sean, the company's CEO, who brings a decade of AI experience including work on Google Assistant (notably the 2018 AI calling demonstration for restaurants and salons) and autonomous vehicle development at Waymo. This background in both conversational AI and autonomous systems heavily influences their approach to healthcare call center automation.
The core problem HeyRevia addresses is that more than 30% of healthcare operations still run through phone calls. These calls span a wide range of activities from simple appointment scheduling to complex negotiations with insurance companies regarding credential verification, prior authorizations, referrals, claims denials, and benefits inquiries. The current industry solution involves Business Process Outsourcing (BPO) providers where human agents in call centers often end up calling each other, sometimes sitting in adjacent rooms but still required to communicate via phone. This represents a significant inefficiency that HeyRevia aims to solve with AI agents.
## Voice Agent Landscape and Technical Challenges
Sean provides valuable context on the current state of voice agent technology. Over the past two years, speech-to-text (STT) and text-to-speech (TTS) capabilities have improved dramatically, and large language models have evolved from text-only inputs to handling audio directly, as demonstrated by OpenAI's real-time API. However, significant production challenges remain.
The typical voice agent architecture follows a pipeline approach used by platforms like VAPI, Retell, and Bland. This pipeline flows from telephony systems (like Twilio or 10x) through streaming input handling via WebSocket and WebRTC, to ASR solutions (Assembly AI, Deepgram, Whisper) for speech-to-text conversion, then to the LLM for understanding and response generation, and finally back through TTS to produce audio output.
The key limitations of this pipeline approach include:
- **Latency sensitivity**: Voice agents process information in 20-millisecond increments. Any delay beyond 500 milliseconds becomes noticeable to users and degrades the experience. System crashes or performance issues at any pipeline stage directly impact call quality.
- **Hallucination risks**: In healthcare, hallucinations can be "deadly" - if an AI incorrectly communicates prescription dosages or medication volumes, it could cause serious patient harm. This makes error prevention critical rather than optional.
- **Task completion complexity**: Maintaining natural conversation while completing complex tasks over 10+ minute calls is extremely difficult for current voice agents, often requiring multiple retry attempts.
## HeyRevia's Architecture: Perception-Prediction-Planning-Control
HeyRevia's architecture draws significant inspiration from autonomous vehicle systems, which is unsurprising given Sean's background at Waymo. Rather than treating voice interaction as a simple pipeline, they model it as an autonomous agent operating in an environment with multiple states and required behaviors.
### Perception Layer
The perception layer continuously processes incoming audio to understand the current state of the call. It can distinguish between:
- Music playing (hold music)
- IVR (Interactive Voice Response) system interactions
- Live human conversation
This real-time perception allows the system to adapt its behavior appropriately rather than processing all audio uniformly.
### Prediction Layer
The prediction layer anticipates what should happen next. A critical optimization mentioned is hold handling: when the system detects that a call is on hold (through the perception layer), it pauses all processing and LLM inference for that call. The agent "sits silently" and waits for a human to join. This saves significant token costs during what could be 30-minute hold times while ensuring the system is ready to respond immediately when a human representative joins.
### Planning Layer
The planning layer addresses what Sean identifies as the primary difference between voice agents and humans: the ability to think ahead. With simple prompt-based approaches, there's no mechanism to provide the AI with sequenced, contextual information about what needs to happen at specific points in the call. The planning layer enables the agent to:
- Anticipate upcoming steps in the call flow
- Think ahead about information requirements
- Self-correct when calls deviate from expected paths
- Adjust course when unexpected situations arise
### Control Layer
The control layer provides guardrails to prevent the agent from going off-track. This is explicitly designed to prevent hallucination and scope creep. For example, when working with pharmaceutical companies, the control layer ensures the AI stays focused on medical information and doesn't drift into irrelevant topics like discussing meals or lunch.
## Operational Features
### Human-in-the-Loop Capability
A distinctive feature of HeyRevia is the ability for human supervisors to take over calls in real-time. When monitoring multiple simultaneous calls (the presentation shows 10-15 concurrent calls), a supervisor can:
- Jump into any individual call
- Take control from the AI mid-conversation
- Speak directly with the human representative on the other end
- Return control to the AI to continue the call
This provides both quality assurance and a recovery mechanism for edge cases the AI cannot handle.
### Call Center API vs. UI
HeyRevia offers two integration patterns:
**Work API (Call Center API)**: This treats the AI as a task executor. Users submit call work items, and the AI handles them autonomously. Importantly, the system has self-correction capabilities - if a call fails due to missing or incorrect information (like an invalid provider ID or NPI number), the AI can identify the issue and request the correct information before retrying. This represents the AI "learning from its mistakes."
**Call Center UI**: Provides a visual interface for monitoring and intervening in calls, enabling the human-in-the-loop functionality described above.
## Evaluation and Benchmarking
HeyRevia's evaluation philosophy is "if you're trying to ask AI agent to do similar human work, you have to evaluate it like a human." They benchmark AI performance against human agents on the same scenarios by analyzing transcripts and comparing outcomes. According to their data, the AI outperforms humans in many scenarios.
A concrete example provided: for insurance claims where the initial claim was denied, human agents typically require two to three phone calls to identify the actual denial reason, while their AI can achieve this in one to two calls by more effectively negotiating with and pushing back on human representatives.
However, Sean acknowledges that LLMs "do make simple and stupid mistakes" - the challenge is catching and handling these during live calls, which is addressed through the control layer and human intervention capabilities.
## Healthcare Compliance and Production Considerations
Operating in healthcare requires extensive compliance measures:
- **Self-hosted LLMs**: HeyRevia hosts their own large language models rather than using third-party APIs, giving them complete control over data handling and retention
- **Data isolation**: Client data and any AI training derived from it is never shared across different providers, patients, or business entities
- **HIPAA and SOC 2**: These are described as "mandatory" for operating in the healthcare space
- **Vendor compliance**: All service providers in the stack (STT, TTS vendors, etc.) must also maintain HIPAA compliance
- **Security patching**: Continuous updates to address security vulnerabilities, particularly given potential government oversight
### EHR Integration
Currently, HeyRevia does not directly integrate with Electronic Health Record (EHR) systems. They operate as a layer on top, functioning as an AI call center that works on behalf of customers. Direct EHR integration may come as the company matures and demonstrates "proof of work."
## Real-World Use Cases
The system handles common healthcare phone-based workflows including:
- Credential verification (credentialing/licensing)
- Prior authorizations
- Referrals
- Consulting requests
- Benefits eligibility inquiries
- Claims negotiations following denials
Each call type involves navigating IVR systems, providing repeated identifying information (NPI numbers, member IDs, etc.), waiting on hold, and then negotiating with human representatives - all of which the AI can handle while humans previously had to dedicate significant time to these tedious processes.
## Production Insights and Lessons
Several practical insights emerge from this case study:
- **Token cost optimization matters**: Pausing LLM processing during hold periods can save substantial costs over 30-minute calls
- **Multi-state awareness**: Voice agents need to handle fundamentally different call states (IVR, hold, live conversation) with different strategies
- **Error criticality varies by domain**: Healthcare's zero-tolerance for certain errors (medication dosages) requires different guardrails than consumer applications
- **Hybrid human-AI workflows**: The ability to seamlessly transfer between AI and human control provides both safety and flexibility
- **Evaluation against realistic baselines**: Comparing AI to actual human agent performance rather than synthetic benchmarks provides more meaningful metrics
The case study represents an interesting application of autonomous agent principles to a highly regulated, high-stakes domain where the consequences of errors are severe but the potential for efficiency gains is substantial.