Company
Tastewise
Title
Dutch YouTube Interface Localization and Content Management
Industry
Media & Entertainment
Year
2024
Summary (short)
This appears to be the Dutch footer section of YouTube's interface, showcasing the platform's localization and content management system. However, without more context about specific LLMOps implementation details, we can only infer that YouTube likely employs language models for content translation, moderation, and user interface localization.
## Overview This case study comes from a talk by Leor, Head of Data Science at Tastewise, a food and beverage intelligence platform that helps major F&B companies (Nestlé, C Tank, and other large brands) understand market trends, develop new products, and craft marketing messages. The company positions itself as the first food tech company to deploy generative AI in production, having moved rapidly from the release of ChatGPT in December 2022 to having a production system by March 2023, with a more mature v2 by June 2023. Tastewise collects massive amounts of food and beverage data from social media, recipes, menu items, and other sources, then applies AI models to extract insights about trending ingredients, flavors, dishes, diets, and consumer motivations. The challenge was making this data accessible to non-technical users through natural language queries rather than requiring them to understand the platform's complex filtering and navigation system. ## The TasteGPT Architecture TasteGPT is fundamentally a RAG (Retrieval-Augmented Generation) system, though with a twist: rather than retrieving documents to augment prompts, it translates natural language queries into structured API requests that fetch data from Tastewise's proprietary databases and AI models. The speaker is clear that TasteGPT is not simply wrapping ChatGPT - it's specifically designed to interact with Tastewise's unique data and internal AI models. The architecture consists of two main components: the Navigator and the Summarizer. ### The Navigator The Navigator is responsible for understanding user questions and translating them into JSON API requests for the Tastewise platform. It performs several key functions: **Follow-up Question Handling**: The system maintains conversational context by detecting whether a new question is a follow-up to a previous one. For example, if the previous question was "what are good bases for shakshuka" and the current question is "what about without tomato," the system generates a new combined question: "what are good bases for shakshuka without tomato." Importantly, the system is intelligent enough to recognize when questions mention the same topic but are not actually follow-ups - if someone asks "is shiso trending" followed by "what are good pairings for shiso," these are treated as independent queries despite both mentioning shiso. **Page Selection**: The Tastewise platform has multiple pages, each providing different types of analysis (ingredients/flavors, consumption moments, consumer needs, etc.). The Navigator must determine which page is appropriate for each query. **Functionality Mapping**: Each page has its own unique filtering and query capabilities. The Navigator must understand what functionalities are available and map the user's intent to the appropriate parameters. ### The Summarizer Once the API returns data, the Summarizer takes the raw results along with the original user question and generates a human-readable response with actionable insights. It also provides links to the relevant pages in the Tastewise platform for users who want to dig deeper. ## Evolution from Naive to Robust Prompt Architecture One of the most valuable LLMOps lessons from this case study is the evolution of their prompting approach. The team initially attempted a naive approach: a single "magnificent prompt" that would take the user question and simultaneously determine the correct page, extract query terms, apply all filters, and generate the complete API request. This approach failed completely. The solution was to decompose the problem into many small, focused prompts, each with a very limited responsibility: - One prompt handles page navigation - One prompt extracts query terms - One prompt handles location filtering - One prompt handles date filtering - Additional prompts for other filters and parameters This decomposition provided several critical advantages for production systems: **Parallel Execution**: Because prompts are independent, they can run in parallel rather than sequentially, reducing latency significantly. **Isolated Debugging**: When issues arise, engineers can disconnect other prompts and focus on debugging a single component without interference. **Fault Isolation**: If one prompt fails, it doesn't cascade to break the entire pipeline. The system can handle partial failures gracefully. **Easier Iteration**: Changes to one prompt don't affect others, enabling faster experimentation and improvement cycles. **Extensibility**: New prompts can be added for new functionality without modifying existing prompts. The results from all prompts are grouped together and assembled into the final API request. ## Prompt Structure and Few-Shot Learning Each prompt follows a consistent structure: - Role definition ("You are a chatbot programmed to...") - Step-by-step instructions with explicit rules - Output format specification (JSON schema) - List of examples with explanations The team made an interesting discovery about the power of explanations in few-shot examples. While debugging, they noticed GPT spontaneously added explanations to its outputs, writing things like "the question is about pizza toppings, we should add pizza to the query and filter by toppings." Rather than treating this as unwanted output, the team recognized its value: including explanations in their few-shot examples provided more context and helped GPT understand the broader intent behind each example rather than just pattern matching. This allowed them to use fewer examples while achieving better generalization. ## Handling Hallucination: The Location Filter Case Study The location filtering case study is particularly instructive for LLMOps practitioners dealing with hallucination issues. Tastewise supports specific markets (US, UK, Brazil, Canada, Germany, Spain, Mexico, etc.) and needs to filter queries appropriately. Users might also specify states or cities within supported markets. The challenge: GPT's fundamental goal is to "satisfy the user." When a user mentions Belgium or Israel (unsupported markets), GPT wants to be helpful and will hallucinate that these are valid options rather than returning an empty result. The speaker emphasized understanding this core behavior as essential to working effectively with LLMs. The naive approach - simply listing supported countries in the rules and instructing GPT to leave the field empty for unsupported countries - didn't work. GPT would sometimes hallucinate that unsupported countries were valid. The solution involved introducing an "intermediate output layer" that decomposed the task into two stages: **Initial Layer**: Extract any country, city, or state mentioned in the text. No validation rules apply. GPT can satisfy its desire to be helpful by identifying these entities. **Final Layer**: Take the extracted entities from the initial layer and validate them against the supported markets. Place valid entities in the final output fields; place invalid entities in explicit "invalid_country" or "invalid_city" fields. This approach works because it gives GPT an outlet for its helpful behavior (extracting all mentioned locations) while providing a structured second step for applying business rules. The explicit invalid fields also force the model to consciously categorize entities rather than quietly ignoring the rules. Examples show this working correctly: "What are the most popular vegetables in soup in Beijing, China?" results in China being extracted in the initial layer but marked as invalid_country in the final layer, with an explanation that China is not a supported market. ## Requirements and Design Principles The team established clear requirements for TasteGPT that shaped their LLMOps decisions: **Intuitive**: Users should not need to learn special syntax or understand the platform's internal structure to ask questions effectively. **Accurate**: Results must be relevant to the question. Inaccurate results will cause users to stop using the system. **Insightful and Fast**: The system must provide value faster than users could find information manually through the traditional UI. **Continuous Learning**: The team must review logs and user questions to constantly improve the system. **Ahead of Expectations**: The system should exceed what users think is possible. **Secure**: Users must trust that their proprietary questions and data are not being shared with OpenAI or other third parties. ## Image Generation for Content Agency Beyond TasteGPT, Tastewise also explored generative AI for image generation through what they call their "content agency" offering. The use case is compelling: food and beverage companies typically spend around $20,000 for five professional product images when developing new concepts or marketing campaigns. AI image generation could dramatically reduce this cost. The speaker shared an honest progression from early failures to production-quality results. Early attempts produced classic AI image problems: connected hands, disturbing eyes, weird fingers, animated/synthetic looks, visible chicken patterns, strange carrots, and the infamous issues with living animals (tails, eyes). None of these were usable in production. Key learnings for production image generation included: **The Chaos Parameter**: In Midjourney, the chaos parameter controls variation between generated images. Low chaos produces similar images; high chaos produces dramatically different concepts. This is useful for exploration phases where clients want diverse ideas before narrowing down. **Literal Interpretation Issues**: AI models often interpret prompts too literally. "Pepper cheese" resulted in cheese with actual whole peppers on top rather than pepper-flavored cheese. "Broccoli shake" showed a green drink with whole broccoli florets. Understanding these tendencies requires hands-on experimentation. **Camera Angles**: Playing with camera angles creates more dramatic and professional-looking images. **Material and Packaging**: The team developed capabilities for generating product packaging with different materials (transparent, nylon, plastic, glass) and even modifying existing brand bottles while preserving logos. The fundamental lesson: tutorials and prompt collections are useful starting points, but production-quality generative AI requires hands-on experimentation with your specific use case. Generic "top 50 prompts" won't solve domain-specific challenges. ## Timeline and Organizational Impact The rapid timeline is notable from an LLMOps perspective: - October 2022: Initial internal discussion about chatbot possibilities (dismissed as impossible) - December 2022: ChatGPT release shifts perspective - February 2023: Generative AI hackathon with commitment to deliver something working - March 2023: First version of TasteGPT in production - June 2023: Second version launched at Tastewise Summit with significant business impact The speaker emphasized that this was a true cross-functional effort involving the CEO, product team, sales, marketing, and design - not just the data science team. This organizational alignment appears critical to the rapid deployment timeline. ## Key LLMOps Takeaways This case study offers several practical insights for LLMOps practitioners: The importance of decomposing complex prompts into focused, parallelizable components cannot be overstated. This architectural decision enabled better debugging, fault isolation, and iterative improvement - all essential for production systems. Understanding the model's "desire to satisfy users" helps explain and prevent hallucination. Designing intermediate output layers that give the model appropriate outlets for this behavior while still enforcing business rules is a practical technique. Leveraging model-generated explanations in few-shot examples provides richer context than examples alone, potentially reducing the number of examples needed while improving generalization. For domain-specific applications, generic LLM capabilities are insufficient. TasteGPT's value comes from its integration with Tastewise's proprietary data and AI models - it's explicitly not just ChatGPT with a different interface. Finally, the speaker's emphasis on hands-on experimentation over tutorials reflects a mature understanding that production AI requires deep engagement with specific use cases rather than applying generic patterns.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.