## Overview
Prosus, a global technology investment company operating consumer internet services in over 100 countries and serving approximately one quarter of the world's population, undertook an ambitious two-year journey to deploy generative AI capabilities at enterprise scale. The company had already been extensively using AI and machine learning for several years, with hundreds of models in production supporting their large-scale operations. However, recognizing the transformative potential of large language models, they embarked in summer 2022 on developing a personal AI assistant for their colleagues across the Prosus group to enable hands-on exploration and collective discovery of GenAI capabilities.
The case study is notable for its scale and duration, with the AI assistant (called Toqan.ai) being utilized by approximately 13,000 employees across 24 different companies within the Prosus ecosystem as of the publication date in May 2024. This represents one of the larger documented enterprise deployments of LLM-based assistants, providing valuable insights into real-world adoption patterns, technical challenges, and organizational benefits at scale.
## Early Experimentation and Foundation (2019-2021)
Prosus's journey began well before the mainstream ChatGPT moment, with explorations starting in 2019 using early large language models such as BERT and GPT-2. The organization recognized that while these models were not yet ready for widespread deployment, they represented significant advances in processing language and unstructured data. This forward-looking approach positioned them to understand both the capabilities and limitations of LLMs before they became ubiquitous.
Between 2020 and 2021, Prosus conducted an extensive program of practical field experiments, running over 20 different pilot projects in collaboration with companies across their portfolio. These experiments explored diverse applications including creating educational materials, question-answering systems, document synthesis, code automation, documentation generation, and bug fixing. This systematic experimentation phase was critical for understanding viable applications and the conditions necessary for LLM effectiveness.
A key finding from these early experiments was that many companies identified similar use cases independently, such as analyzing help desk tickets, but also discovered unexpected applications that increased work efficiency and operational independence. Importantly, the case study notes that most promising use cases emerged through bottom-up discovery rather than top-down mandates, often developing organically in collaborative project channels. This insight shaped their decision to facilitate efficient "collective discovery" through a general-purpose assistant rather than prescribing specific applications.
## Architecture and Technical Implementation
The Toqan.ai assistant was designed as a general-purpose chatbot with particular attention to the needs of product and technology teams. The platform demonstrates sophisticated LLMOps practices through several architectural choices. Initially accessible through Slack integration, which lowered adoption barriers by meeting users in their existing workflow tools, the assistant integrates an impressive array of capabilities spanning over 20 models and tools.
The multi-model architecture includes commercial LLMs, open-source models, and models fine-tuned in-house, providing flexibility to match different use cases with appropriate model capabilities and cost profiles. Beyond text generation, the platform incorporates image interpretation and generation, voice encoding and generation, large document processing, data analysis, and code creation capabilities. This comprehensive toolkit approach allows users to tackle diverse tasks within a single interface rather than switching between specialized tools.
A particularly noteworthy technical feature is the integration with internal knowledge bases across the portfolio companies to provide grounded responses. This represents an implementation of retrieval-augmented generation (RAG) patterns, allowing the assistant to answer questions based on company-specific documentation and information rather than relying solely on the pre-training knowledge of the base models. This grounding mechanism is essential for providing accurate, contextually relevant responses in an enterprise setting.
## Privacy, Security, and Guardrails
The case study highlights the implementation of several critical guardrails to address privacy and security concerns inherent in enterprise LLM deployment. Prosus implemented no-learning and no-retention policies to protect sensitive data from being used to train future models, addressing a common concern about using commercial LLM APIs in enterprise contexts. These privacy measures were essential for building trust and encouraging adoption across diverse business functions where employees might be working with confidential information.
Beyond data privacy, the platform incorporated mechanisms to manage the quality and reliability of outputs. The implementation of comprehensive feedback systems, including both positive indicators (thumbs up, heart) and negative indicators (thumbs down, and notably a "Pinocchio" button specifically for unreliable or fabricated answers), provided structured channels for capturing user experience with model outputs.
The trajectory of "Pinocchio" feedback provides valuable insights into LLM reliability in production. Initially accounting for almost 10% of responses in fall 2022, this rate dropped to below 3% by June 2023 and stabilized around 1.5%. The case study attributes this improvement to three factors: enhancements in underlying models as providers improved their offerings, enhanced prompting techniques as the engineering team refined their approaches, and better user proficiency as employees learned to craft more effective prompts. This multi-faceted improvement underscores that production LLM quality depends on model capabilities, prompt engineering, and user education in concert. The acknowledgment that eliminating bad responses entirely is impossible but they can be effectively managed represents a realistic assessment of current LLM limitations.
## User Education and Change Management
Prosus launched an extensive education and training program to support assistant adoption, with the notable approach of delivering some training through the tool itself. This meta-application of the technology—using the AI assistant to teach people how to use AI assistants—represents an innovative approach to user onboarding and continuous learning. It also demonstrates practical value by immediately showing users what the tool can accomplish.
The focus on education appears to have been critical to the platform's success, helping users understand both capabilities and limitations while developing proficiency in prompt engineering. The case study notes that user skill improvements contributed significantly to the reduction in poor-quality outputs over time, suggesting that effective LLMOps requires investment in human capital alongside technical infrastructure.
## Use Cases and Adoption Patterns
Analysis of interaction data and user interviews revealed that approximately 50% of assistant usage involves engineering-related tasks, with the remaining 50% serving diverse purposes across business functions. This distribution is particularly interesting given that the tool was initially designed with product and technology teams in mind, suggesting that the value proposition extended well beyond the original target audience.
The case study provides several concrete examples of actual usage that illustrate the range and specificity of tasks being performed. Engineers use the assistant to correct and explain code snippet errors in their team's style and automatically generate appropriate documentation. Product managers leverage it to summarize experiments documented in Confluence over extended time periods, demonstrating document synthesis capabilities. Team managers employ it to improve and rewrite performance feedback, enhancing communication quality while maintaining appropriate tone and content focus.
The two most frequent usage categories identified were "software engineering" and "writing and communication." Notably, engineering tasks are common even among non-engineers, who use the assistant for simpler exploratory tasks, personalizing tools, and conducting data analysis. This democratization of technical capabilities represents a significant organizational benefit, allowing people without deep technical skills to accomplish tasks that previously required specialist involvement.
A particularly interesting finding is what the case study terms the "movement of liberation from the dashboards." A notable group of users seeks direct database access, formulating queries in natural language to bypass traditional dashboard interfaces. This represents a shift in how people interact with organizational data, potentially reducing bottlenecks around dashboard creation and maintenance while empowering more people to answer their own questions.
The prevalence of "writing and communication" tasks among non-engineers, with constant demand for enhancing clarity and nuance in communication spanning report writing to email composition, underscores an important but sometimes overlooked value proposition of LLMs. The case study notes this highlights the value of a personal, private tool as a safe space for asking even basic questions without judgment—an important psychological dimension of enterprise AI adoption.
## Impact and Business Value
The productivity impact of the assistant appears substantial, with over 81% of users reporting productivity increases exceeding 5-10%. The case study mentions that A/B testing for certain tasks shows time reductions of 50% or more, aligning with broader industry results around LLM-powered productivity gains. Importantly, about 60% of users turn to the assistant as a first help resource to get unstuck or get started on tasks, suggesting it has become genuinely integrated into daily workflows rather than being a novelty or occasional tool.
User feedback highlights three main categories of benefits beyond raw speed improvements. First, increased speed in task execution, particularly notable in engineering contexts where code generation and debugging assistance compress development cycles. Second, the ability to undertake more tasks than previously possible, such as design work and data analysis for individuals who lacked those specific skills, representing capability expansion rather than mere acceleration. Third, greater independence and reduced reliance on colleagues, which both empowers individuals and reduces organizational bottlenecks.
The case study offers an important perspective on productivity impacts, noting that their initial view was simplistic—expecting well-defined portions of work to be automated. Instead, they discovered "a wide array of micro-productivity bursts distributed across all workflows." This observation suggests that LLM impact in knowledge work may be more diffuse and pervasive than traditional automation, with value accruing through many small improvements rather than wholesale task replacement. These micro-productivity bursts increasingly cluster around themes and "jobs to be done," such as data access without intermediation or market research, and these insights guide the development of vertical applications and specialized AI assistants.
## Use Case Discovery and Graduation to Production
One of the most strategically valuable aspects of the Toqan.ai deployment has been its role in use case discovery and validation. Teams across the organization use the general-purpose assistant to discover and test use cases for their specific organizational contexts. They stress-test potential applications with the AI assistant until convinced of viability, then graduate those use cases into regular engineering practices and production systems. This approach effectively uses the assistant as a rapid prototyping and validation platform before committing resources to building specialized production applications.
The case study provides several concrete examples of products that emerged from this discovery pattern. Brainly.com developed Genie, a learning assistant for K-12 education. iFood.com.br created compr.ai, a conversational grocery ordering application. SoloLearn.com built Kodie, an award-winning coding mentor. GoodHabitz.com developed simulation role-play functionality for learning sales skills. The case study notes this pattern has occurred approximately a dozen times and has become ingrained in the operations of companies using the AI assistant.
This approach to use case discovery represents sophisticated LLMOps strategy, recognizing that the most valuable applications may not be obvious upfront and that providing a safe, low-friction environment for experimentation can surface opportunities that top-down planning might miss. The graduation pathway from general assistant to specialized production application provides a natural evolution that balances exploration with focus.
## Evolution Toward Agents
The case study concludes with discussion of the platform's evolution from a question-and-answer system toward a tool capable of performing complex tasks through agent-based functionality. After nearly two years of development, agent capabilities enabling web browsing, code creation and execution, and API connections have matured to the point of practical business viability. The platform is gradually introducing functionality where the system intelligently selects appropriate agents based on the task at hand.
This shift toward vertical, agent-based tools represents what the case study identifies as a significant opportunity for value creation and differentiation for the Prosus group. The evolution from general-purpose assistant to task-specific agents suggests a maturation trajectory for enterprise LLM applications, where initial broad exploration gives way to focused, capability-specific tools that can handle more complex, multi-step workflows.
## Critical Assessment and Balanced Perspective
While the case study presents an overall positive view of Prosus's AI assistant deployment, several aspects warrant balanced consideration. The text originates from the company's own tech blog and naturally emphasizes successes and benefits. The reported productivity improvements of 5-10% or more from over 81% of users, while impressive, reflect self-reported data that may be subject to various biases. The specific methodology for these measurements and the presence of control groups is not detailed.
The reduction in "Pinocchio" feedback from 10% to 1.5% represents improvement but also indicates that problematic outputs, while reduced, remain an ongoing challenge requiring management. The case study's acknowledgment that eliminating bad responses entirely is impossible reflects appropriate realism about current LLM limitations. Organizations considering similar deployments should plan for continuous monitoring and user education rather than expecting perfect reliability.
The focus on "collective discovery" and bottom-up use case identification, while valuable, also suggests that clear ROI and business case development for LLM deployments may be challenging. The diffuse nature of "micro-productivity bursts" across workflows, while potentially valuable, may be difficult to measure and justify to stakeholders expecting clear metrics. The heterogeneity of use cases—from code debugging to email writing to database queries—makes standardized evaluation challenging.
The integration of over 20 models and tools, while providing comprehensive capabilities, also implies significant engineering complexity and ongoing maintenance burden. The case study does not discuss the operational costs, infrastructure requirements, or challenges in keeping multiple models updated and integrated. The privacy and security measures, while clearly important, likely add additional complexity and constraints to the system.
The graduation of use cases from the general assistant to specialized production applications demonstrates value but also raises questions about the long-term role of the general assistant. As more use cases spawn dedicated applications, will the general assistant's importance diminish, or will it continue serving as an experimentation platform? The strategic positioning and resource allocation between the platform and spawned applications represents an ongoing challenge.
Finally, while the case study mentions education and training programs, the specific details of what makes enterprise LLM education effective remain somewhat unclear. The challenges of varying user sophistication, changing model capabilities, and evolving best practices in prompt engineering all suggest that user education is an ongoing investment rather than a one-time effort.
Despite these considerations, the case study provides valuable insights into large-scale enterprise LLM deployment, demonstrating that with appropriate technical infrastructure, governance mechanisms, and organizational support, general-purpose AI assistants can deliver measurable value across diverse business functions while serving as platforms for innovation and use case discovery.