Company
Propel
Title
AI-Powered SNAP Benefits Notice Interpretation System
Industry
Government
Year
2025
Summary (short)
Propel developed an AI system to help SNAP (food stamp) recipients better understand official notices they receive. The system uses LLMs to analyze notice content and provide clear explanations of importance and required actions. The prototype successfully interprets complex government communications and provides simplified, actionable guidance while maintaining high safety standards for this sensitive use case.
## Overview Propel, a company that builds technology for SNAP (Supplemental Nutrition Assistance Program, commonly known as food stamps) recipients, is developing an AI-powered tool to help users understand government benefit notices. This case study represents an early-stage LLMOps project that is notable for its careful, safety-conscious approach to deploying AI for vulnerable populations who face "extreme negative consequences from failures or problems of a tool." The company's mission involves helping low-income Americans navigate the complex SNAP benefits system. They identified a specific pain point through user research and social media observation: official notices from SNAP agencies are often confusing, leading to missed deadlines, unnecessary benefit denials, and high call volumes to already-strained state agencies. ## Problem Definition SNAP notices serve as formal communications about critical matters such as benefit denials, approvals, amount changes, document requests, and missed appointments. These notices are legally mandated to contain specific information, but this very requirement creates several user experience problems: - **Legal and technical language** that is uncommon in everyday life creates barriers to understanding - **Cognitive overload** from legally required information makes it difficult to identify what matters most - **Mixed importance levels** where some notices are purely informational while others require immediate action to prevent benefit loss - **Complex benefit calculations** that recipients struggle to verify for accuracy The team validated this problem by observing that people are already posting their SNAP notices on Reddit and Facebook asking for help understanding them. As the article notes, "People are already walking this route — we're paving it for them." ## Technical Approach and Implementation ### Model Selection and Experimentation Propel is testing "a variety of models, prompts, designs, external context with real SNAP notices to see what generates helpful output." The primary example shown uses Anthropic's Claude 3.5 Sonnet (newest version), though the team is clearly evaluating multiple options in their development process. The choice to use LLMs for this problem is justified by the observation that "fundamentally most of the problems people have with notices are about language friction, and LLMs have strong capabilities when it comes to processing and manipulating language." ### Prompt Engineering The article provides transparency into the actual prompt being used in the prototype: The prompt establishes the model's persona as "a legal aid attorney specializing in SNAP benefits" and structures the output into two sections: importance assessment (high/medium/low with explanation) and action items. Key prompt engineering decisions include: - **Reading level specification**: "using 6th grade reading level language" to ensure accessibility for the target population - **Empathetic tone direction**: Acknowledging that users "have a lot to manage in their life" - **Format guidance**: "Generate bullets if that is the most effective way to convey the information" - **Scope limitation**: Focused on notice interpretation rather than open-ended advice This approach demonstrates thoughtful prompt design that considers both the functional requirements and the emotional context of users navigating a stressful benefits system. ### Two Core Outputs The tool is designed around two primary value propositions: - **Importance triage**: Categorizing whether a notice is high, medium, or low importance relative to maintaining benefits - **Action extraction**: Clearly stating what, if anything, the user needs to do in response This structured output approach is notable from an LLMOps perspective because it creates measurable dimensions for evaluation and allows for consistent user experience across different notice types. ### Prototyping Stack The team is using Streamlit for rapid prototyping, which is described as "an open source tool enabling rapid iteration on applications using AI models." This choice reflects a common pattern in early-stage LLM application development where speed of iteration is prioritized over production-grade infrastructure. ## Safety and Risk Framework This case study is particularly valuable for its explicit treatment of safety considerations before production deployment. The team outlines several key risk areas: ### Deployment Staging The team explicitly states "We are not deploying to users in this state." They are instead gathering feedback from SNAP program experts and using real examples from social media to "harden this tool and proactively identify risks before moving forward." ### Hallucination Mitigation The document argues that notice interpretation may be inherently safer than open-ended chatbot applications: "Because most of the use of AI here is on processing information already included on the notice itself, this may be safer than other deployment paths, such as a chatbot generating novel answers to open-ended questions. Hallucination risk is lower by definition due to this." They also mention potential additional safeguards through prompt chaining: "we could have a separate model call evaluate the response to ensure all provided information is found on the original notice in some form." ### Information Filtering Risks There's an interesting tension identified between simplifying information (the core value proposition) and potentially hiding important edge-case information. Proposed mitigations include always reinforcing that users should read the entire notice and proactively offering additional information on less common but potentially important topics. ### PII Handling The team acknowledges the challenge of using external model APIs with documents containing personally identifying information. They mention Microsoft's open-source Presidio as a potential local redaction/deidentification solution, and note that many users already redact information when posting notices online. ### Incorrect Source Documents An interesting edge case is addressed: what happens if the original notice itself contains errors? The team is "considering whether we can include additional information or run additional checks in highly consequential situations that could inform the user if the notice itself appears erroneous or in violation of policy." ## Future Directions and Agent Considerations The article hints at more ambitious future capabilities: ### Background Processing Agents The team envisions "an agent processing notices in the background and triaging the person's attention to just those highest-importance notices." This represents a shift from reactive (user-initiated) to proactive (system-initiated) AI assistance. ### RAG-Like Enhancement They discuss "bringing in external information in-context with the notice" to provide more complete assistance—for example, if a phone number on a notice is known to be frequently unavailable, the tool could provide alternative contact methods. ### Integration with Navigation The notice tool is positioned as potentially complementary to broader benefits navigation assistance, since "many problems' first step is assessing any notices received recently." ## Evaluation and Feedback Strategy The team is taking a multi-stakeholder approach to evaluation: - **Expert review**: Legal aid attorneys, SNAP outreach workers, and eligibility workers are invited to test the alpha tool with notices they encounter - **Real-world examples**: Testing with actual notices posted by confused users on social media provides realistic evaluation cases - **AI practitioner input**: Seeking feedback on effective capability utilization and safety evaluation methods ## Balanced Assessment This case study represents a thoughtful, safety-conscious approach to deploying LLMs for a vulnerable population. The team demonstrates awareness of the heightened risks involved when AI failures could result in people losing access to food assistance. However, it's worth noting that this is still a prototype and the actual production deployment challenges remain ahead. Key questions that would need to be answered before production include: - How will accuracy be measured and monitored at scale? - What are the latency requirements for a good user experience? - How will the system handle notices from different states with varying formats? - What fallback mechanisms exist when the AI cannot confidently interpret a notice? The transparency about being in an early prototype phase, combined with the explicit safety framework, suggests a responsible development approach. The choice to seek expert feedback before user deployment is particularly notable in a landscape where many AI applications rush to production. The use of social media posts as a source of both problem validation (people are confused and seeking help) and test cases (real notices with real questions) is a pragmatic approach to building evaluation datasets in a domain where labeled data may be scarce.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.