JOBifAI: Implementing Effective Safety Filters in a Game-Based LLM Application

LLMOps Database

Media & Entertainment

JOBifAI

Company

JOBifAI

Title

Implementing Effective Safety Filters in a Game-Based LLM Application

Industry

Media & Entertainment

Link

https://woolion.art/2025/01/02/DEFECTIVE.html

Year

2025

Summary (short)

JOBifAI, a game leveraging LLMs for interactive gameplay, encountered significant challenges with LLM safety filters in production. The developers implemented a retry-based solution to handle both technical failures and safety filter triggers, achieving a 99% success rate after three retries. However, the experience highlighted fundamental issues with current safety filter implementations, including lack of transparency, inconsistent behavior, and potential cost implications, ultimately limiting the game's development from proof-of-concept to full production.

Tags

## Overview JOBifAI is an LLM-powered game that demonstrates novel gameplay mechanics enabled by large language models. The game places players in an interview scenario where they've submitted an AI-generated portfolio to a company. This case study, published in January 2025, offers a candid and somewhat critical perspective on the operational challenges of deploying LLMs in interactive, real-time applications where user input is unpredictable and the system must handle both legitimate queries and potential misuse gracefully. The developers' experience provides valuable insights into the practical realities of LLMOps, particularly around handling safety filters, managing unreliable outputs, and building robust retry mechanisms. While the tone of the original article is somewhat critical of LLM providers' implementation choices, the technical lessons learned are broadly applicable to any production LLM deployment that involves user-generated input. ## The Core Challenge: Unreliable LLM Outputs The fundamental problem the JOBifAI team encountered is one familiar to many LLMOps practitioners: LLM outputs cannot be trusted without verification. When applications accept free-form user input, the reliability of LLM responses drops significantly. The developers identified three distinct failure modes that they had to contend with in production: The first failure mode involves invalid JSON responses. When the LLM is instructed to return structured data in JSON format, it sometimes produces malformed output that cannot be parsed, resulting in a 400 error from the API. This is a purely technical failure unrelated to content safety. The second failure mode involves schema non-conformance. Even when the LLM returns valid JSON, the response may not match the expected schema. The developers noted they attempted type casting (such as converting a string "2" to an integer 2) to handle minor deviations, but sometimes the response structure simply did not conform to what was requested. The third failure mode involves safety filter rejections. When the safety filter determines a query is unsafe, the API returns a 400 error. Critically, this error code is indistinguishable from the technical failures mentioned above, making it difficult for the application to respond appropriately. ## Prompt Engineering Approach The developers structured their prompts in a specific pattern to handle the interactive nature of the game. The prompt architecture follows this general form: - A context section establishing the game scenario - The player's action or input - A list of potential outcomes or action categories - An instruction to return results as a JSON dictionary with a specific schema The expected output format was `{"choice": c, "sentence": s}`, where `c` represents which predefined action category best matches the player's input, and `s` provides a narrative description of the result. This structured output approach is a common pattern in LLMOps for ensuring that LLM responses can be programmatically processed by downstream systems. ## The Retry Mechanism Solution Given the frequency of failures across all three categories, the developers implemented a retry mechanism as their primary mitigation strategy. Rather than failing immediately on any error, the system attempts the query up to three times before giving up. According to their playtesting observations (they note these are not hard metrics), the success rates were approximately: - Single attempt: ~75% success rate - Two attempts: ~90% success rate - Three attempts: ~99% success rate This approach effectively papers over both technical failures and transient safety filter false positives. The developers acknowledge this is a "kludgy workaround" rather than an elegant solution, but it was necessary to maintain acceptable user experience. The 25% initial failure rate is remarkably high and would be unacceptable in most production systems without some form of retry logic. ## In-Game Safety Handling One clever aspect of the JOBifAI implementation is how they designed the game mechanics to handle genuinely inappropriate player inputs. Rather than relying solely on the LLM safety filters, they built safety into the game's narrative logic. The example given is illustrative: if a player attempts to ask about creating explosives, the in-game secretary character would realistically call security, resulting in an instant game over. This approach has several advantages from an LLMOps perspective. It provides narrative consistency since the response matches what would happen in a real-world scenario. It also removes the need for the LLM itself to handle truly unsafe content, as the game logic intercepts and handles it. Additionally, it creates natural consequences for adversarial users without exposing the underlying technical limitations of the system. ## Criticism of Current API Design A significant portion of the case study is devoted to critiquing how LLM providers implement safety filters and error handling. The developers argue for more granular error codes that would allow applications to distinguish between different types of issues. Their proposed error taxonomy includes: - Responses that should be reviewed as they involve sensitive topics like legal advice - Responses involving specific person names where confusion with homonyms is possible - Topics that are too sensitive to be answered at all - A general "refused for safety reasons" code separate from technical errors The current situation, where safety rejections return the same 400 error as JSON parsing failures, prevents applications from implementing appropriate responses. A safety rejection might warrant showing the user a message about acceptable content, while a JSON parsing failure should be silently retried. ## Cost and Business Implications The developers raise important points about the business implications of unreliable LLM outputs. The retry mechanism effectively multiplies the cost of LLM usage, since queries that fail must be resubmitted. They note that if users were billed per query, technical failures "should theoretically not be counted as individual requests," but in practice they often are. More concerning is the unpredictability this introduces into cost planning. Adversarial users could potentially spam the system with queries designed to trigger safety filters, causing excessive retries and inflated costs. The developers mention the possibility of using simpler heuristics like word vectors to pre-filter obviously problematic queries, but acknowledge this adds complexity and is imperfect. The observation that "safety triggers are generally even more unreliable than typical answers" is particularly noteworthy. This suggests that safety filtering systems may have higher variance in their decisions than the core LLM reasoning, making them a significant source of operational unpredictability. ## Limitations and Caveats It's important to note some limitations of this case study. The success rate percentages cited are explicitly described as coming from playtesting observations rather than rigorous measurement, so they should be treated as approximate. The article also has a somewhat polemic tone regarding safety filters, which may color the presentation of the challenges encountered. The developers explicitly describe JOBifAI as a "Proof of Concept" released for free, and note that its "unreliable foundations would deter us from developing it into a full-fledged program." This is an honest assessment that the current state of LLM reliability was sufficient for a demo but would not meet the bar for a commercial product. ## Broader LLMOps Lessons Despite the critical tone, this case study surfaces several valuable lessons for LLMOps practitioners: Retry mechanisms are essential when deploying LLMs in production. A single-attempt approach will result in unacceptably high failure rates for interactive applications. Planning for at least 2-3 retries should be standard practice. Structured output formats like JSON are valuable but imperfect. Even with clear instructions, LLMs will sometimes produce malformed output, and applications need graceful degradation paths. Safety filters introduce significant operational complexity. Their current implementation as opaque, binary decisions with undifferentiated error codes makes building robust applications more difficult than necessary. Application-level safety design can complement or even replace reliance on model-level safety filters. Building safety into the application logic, as JOBifAI did with their game-over mechanic, provides more predictable and narratively consistent handling. Cost estimation for LLM-powered applications must account for retries and failure rates. The effective cost per successful query may be 1.3-1.5x the nominal per-query cost when retries are factored in. The case study also highlights the importance of transparent benchmarking and documentation. The developers reference the "Uncensored General Intelligence Leaderboard" as an example of efforts to quantify model censorship levels, suggesting that this information is valuable for developers choosing which models to deploy. ## Conclusion The JOBifAI case study provides a ground-level view of the challenges facing developers building interactive LLM-powered applications in 2025. While the developers successfully shipped a working proof of concept, their experience highlights significant operational hurdles that remain in the LLMOps space, particularly around error handling, safety filter implementation, and cost predictability. The technical solutions they implemented—retry mechanisms and application-level safety design—offer practical patterns for other developers facing similar challenges, even as the underlying platform limitations remain unresolved.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source