## Overview
Propel, a company serving over 5 million monthly users who manage their SNAP (Supplemental Nutrition Assistance Program) benefits, developed and deployed AI-powered tools to address a critical problem affecting approximately 200,000 users monthly: interruptions in benefit deposits due to administrative issues. This case study demonstrates practical applications of LLMs in government services, specifically targeting the reduction of "program churn" - situations where eligible recipients experience lapses in benefits due to paperwork or procedural issues rather than actual ineligibility.
The company's approach represents a thoughtful implementation of AI in production environments serving vulnerable populations, with careful attention to monitoring, escalation procedures, and measured deployment strategies. Their work addresses both immediate user needs and broader systemic inefficiencies in government benefit administration.
## Technical Implementation and Architecture
Propel implemented two distinct AI-powered solutions, each representing different approaches to LLM deployment in production systems serving government benefits recipients.
### Test 1: AI-Generated Triage Flow System
The first implementation focused on California's CalFresh program and utilized AI primarily for code generation rather than direct user interaction. This approach demonstrates an important LLMOps pattern where AI capabilities are leveraged in the development process to accelerate the creation of complex decision-tree systems.
The technical architecture involved using AI models to generate code for multi-step diagnostic flows based on written logic trees. This represents a hybrid approach where human expertise defines the decision logic, but AI accelerates the implementation process. The system guides users through structured questions to diagnose why their benefits deposit might be missing and directs them to appropriate resolution actions.
The choice to focus initially on California was strategically sound from both a technical and policy perspective. California's more flexible periodic report policies and the availability of online self-service through BenefitsCal provided a higher probability of successful outcomes, making it an ideal environment for testing AI-powered interventions. This demonstrates good LLMOps practice of starting with scenarios most likely to succeed before expanding to more challenging environments.
### Test 2: Conversational AI Chat Assistant
The second implementation represents a more sophisticated application of LLMs in production, using Decagon as the underlying generative AI platform. This system provides real-time, context-aware assistance to users nationwide, demonstrating several advanced LLMOps capabilities.
The conversational AI system was designed to handle a wide range of scenarios dynamically, generating responses tailored to specific states and individual circumstances. This required the system to understand complex benefit program rules across multiple jurisdictions and provide accurate, actionable guidance. The technical implementation included real-time monitoring, escalation procedures, and performance tracking - all critical components of production LLM systems.
One of the most impressive technical achievements was the system's ability to handle unexpected situations that weren't explicitly programmed. The AI model successfully interpreted state-specific abbreviations like "smrf" (Hawaii's Six Month Report Form) and seamlessly switched languages mid-conversation when users began communicating in Haitian Creole. This demonstrates the robust contextual understanding capabilities of modern LLMs when properly deployed in production environments.
## Production Deployment and Monitoring
Both implementations demonstrate mature LLMOps practices in their approach to production deployment and monitoring. The team conducted small-scale tests with approximately 1,000 users each, allowing for careful monitoring and manual handling of escalations where necessary. This graduated deployment approach is a best practice in LLMOps, particularly when serving vulnerable populations where system failures could have serious consequences.
The monitoring infrastructure included user rating systems, performance tracking, and escalation procedures to human operators when the AI system detected it could not adequately help with an issue. This human-in-the-loop approach represents responsible AI deployment, ensuring that users receive appropriate support even when the automated system reaches its limits.
The team's approach to evaluation was methodologically sound, using randomized testing with control groups to measure key outcomes including days to next deposit and rates of restored benefits. This demonstrates proper evaluation practices for LLM systems in production, focusing on measurable business outcomes rather than just technical metrics.
## Performance and User Experience
The results from both tests show promising outcomes for AI-powered benefits navigation. The conversational AI system achieved a 53% uptake rate among eligible users, indicating strong demand for this type of assistance. Both systems showed improvements in two critical metrics: faster benefits restoration and higher rates of same-month benefit restoration, helping users avoid the lengthy reapplication process.
User feedback was generally positive, with very few negative ratings among those who provided feedback. The system's ability to provide multilingual support without explicit programming for this capability demonstrates the inherent advantages of LLM-based systems over traditional rule-based approaches.
The technical performance included successful handling of edge cases and unexpected user behaviors, suggesting robust system design and appropriate model selection for the use case. The ability to maintain context across conversations and provide state-specific guidance shows sophisticated prompt engineering and knowledge base integration.
## Challenges and Limitations
While the case study presents generally positive results, it's important to note the limited scale of the tests and the careful monitoring required. The need for human escalation procedures indicates that the AI systems, while helpful, were not fully autonomous solutions. This is appropriate given the critical nature of the service but represents an ongoing operational cost.
The focus on specific types of benefit interruptions (periodic reports and recertifications) suggests that the current implementation may not address all possible causes of benefit lapses. The systems appear to work best for procedural issues rather than more complex eligibility determinations, which is a reasonable limitation but one that affects the overall impact potential.
The reliance on users having smartphones and digital literacy to access these tools also represents a potential limitation in reaching all affected populations, though this aligns with Propel's existing user base and service model.
## Broader Implications for LLMOps
This case study demonstrates several important principles for LLMOps in government and social services contexts. The emphasis on careful monitoring, graduated deployment, and human oversight shows how AI systems can be responsibly deployed in high-stakes environments. The use of AI for both code generation and direct user interaction illustrates the versatility of current LLM capabilities.
The success of the multilingual support and contextual understanding features suggests that LLMs can provide more flexible and responsive user experiences than traditional automated systems. This has implications for broader applications in government services where users may have diverse linguistic backgrounds and varying levels of familiarity with bureaucratic processes.
The approach to knowledge base integration and state-specific guidance demonstrates how LLMs can be effectively used to navigate complex, jurisdiction-specific rules and procedures. This could be applicable to many other government services beyond SNAP benefits.
## Future Developments and Scalability
The case study outlines several directions for future development, including proactive reminders, better integration with state benefit portals, and expanded coverage of different types of benefit interruptions. These developments would require further LLMOps maturation, including more sophisticated monitoring systems and potentially more advanced AI capabilities.
The identification that 25% of Propel users don't use their state's online benefit portals represents a significant opportunity for AI-powered guidance to bridge digital divides. This suggests potential for AI systems to serve as interfaces between users and complex government systems, potentially reducing the need for direct human intervention in routine cases.
The team's recommendation to start with small-scale pilots with close monitoring and human support for escalations provides a template for other organizations looking to implement similar AI-powered government services. This approach balances innovation with responsibility, ensuring that vulnerable populations receive appropriate support while advancing the capabilities of AI systems in production environments.