Company
Prosus
Title
Scaling AI Agent Deployment Across a Global E-commerce Organization
Industry
E-commerce
Year
2025
Summary (short)
Prosus, a global e-commerce and technology company operating in 100 countries, deployed approximately 30,000 AI agents across their organization to transform both customer-facing experiences and internal operations. The company developed an internal tool called Toqan to enable employees across all departments—from sales and marketing to HR and logistics—to create their own AI agents without requiring engineering expertise. The solution addressed the challenge of moving from occasional AI assistants to trusted, domain-specific agents that could execute end-to-end tasks. Results include significant productivity gains (such as one agent doing the work of 30 full-time employees), improved quality of service, increased independence for employees, and greater agility across the organization. The deployment scaled rapidly through organizational change management, including competitions, upskilling programs, and democratization of agent creation.

## Overview

Prosus, a global technology and e-commerce company with operations spanning 100 countries across South America, Europe, and Asia, has embarked on one of the most ambitious AI agent deployment initiatives described in recent LLMOps case studies. The company operates various e-commerce platforms including food delivery, traditional e-commerce, job marketplaces, and real estate platforms. Their AI strategy focuses on two parallel tracks: transforming customer-facing e-commerce experiences to be more agentic and personalized, and creating an "AI agentic workforce" to augment their human employees. This case study primarily focuses on the latter—the internal deployment of approximately 30,000 AI agents expected to be in production by March 2025 (from 20,000 total and 8,000 weekly active agents as of October 2024).

The fundamental challenge Prosus identified was transitioning from AI assistants that provide occasional help (described as "interns" that are eager but not always trustworthy and prone to hallucination) to senior-level agent colleagues that are trusted, domain-knowledgeable, and capable of executing end-to-end tasks autonomously. Their vision is that every employee will have between one and twenty agents helping them work better, where "better" encompasses productivity, quality, independence, and agility.

## Technical Architecture and Platform Development

Prosus developed an internal platform called Toqan specifically for building AI agents. The development of this tool has an interesting timeline—it started in 2019, was released just before ChatGPT's public launch, and was transformed into an agent-based tool in 2024. The company began supporting employees with tools to create agents approximately a year before the presentation (around late 2023/early 2024). The decision to build internally rather than use off-the-shelf solutions was driven by several factors: safety and privacy requirements, optimization for their specific use cases, the ability to connect to proprietary integrations and MCPs (Model Context Protocols), guaranteed safety controls, and cost effectiveness.

The platform was officially released to employees in December of the previous year, and adoption has followed an interesting growth curve. Initial months showed steady but linear growth, followed by an exponential uptick after organizational interventions were implemented. By October 2024, approximately 8,000 agents were being used weekly in production, with about 20,000 total agents created (suggesting that some agents serve less frequent or specialized needs).

## Agent Classification and Capabilities

Prosus has developed a sophisticated classification system for agents based on their capabilities and trustworthiness, analogous to employee seniority levels:

**Intern-level agents** work primarily with documents and perform relatively simple tasks without system access. An example provided is a personal newsletter aggregator that scans email subscriptions, identifies topics mentioned across multiple sources (using a voting mechanism to determine importance), and personalizes summaries based on user-specified criteria. While this doesn't necessarily save time that would have been spent anyway, it provides coverage, independence, and personalization benefits.

**Junior agents** have credentials and access to internal systems, allowing them to retrieve data and interact with operational tools. These agents can access systems like email, calendars, help desks, and other organizational systems, as well as make external web requests.

**Intermediate agents** not only access systems but can also write data back to databases. This level requires significantly more trust as there's risk of data pollution or corruption. The data analyst agent exemplifies this category—it allows employees to query databases in natural language (English, Portuguese, or other languages), automatically generates SQL queries, and returns results. This agent is described as "one of the hardest to get right" even after a year of development, suggesting significant complexity in ensuring query accuracy and safety.

**Senior agents** orchestrate multiple other agents and handle complex, multi-step workflows. The restaurant account executive agent demonstrates this capability: it autonomously collects information from multiple systems, assembles comprehensive performance reports, and creates web pages for account managers to use when meeting with restaurant partners. This particular agent is used by approximately 200 employees and reportedly does the work of 30 full-time employees, while also improving advice quality and partner coverage.

The classification correlates strongly with two technical attributes: the number of tools available to the agent, and the number of system integrations the agent can access. Senior agents naturally have more tools and integrations, enabling them to handle more complex use cases.

## Production Use Cases

The case study provides several concrete examples of agents in production:

**AI Account Manager for Partner Support**: This is perhaps the most detailed example provided. Previously, when a restaurant owner, car dealer, or other partner contacted Prosus via WhatsApp about a business question (such as why sales dropped recently), the inquiry would route through a CRM to a human partner manager, who would then request analysis from a data analyst team. This process could take anywhere from 15 minutes to a full day. In the new workflow, the inquiry goes to an AI account manager agent, which recognizes the data-related nature of the question and delegates to an AI data analyst agent. The data analyst agent queries appropriate data sources (Databricks, Tableau, or other systems), packages the findings (e.g., "a similar restaurant opened nearby two weeks ago"), and returns them to the AI account manager. The account manager reviews the information, potentially iterating with the data analyst, then sends the answer to the partner while copying the human partner manager. This provides instantaneous responses and dramatically expands coverage from 20-30% of partners to potentially 100%.

**Restaurant Account Executive**: Used by account managers in their food delivery business (iFoodis mentioned as one of the fastest-growing food delivery companies globally). Before meeting with restaurant partners, account managers need comprehensive performance insights. The agent automatically collects data from multiple internal systems, assembles the information, and creates a formatted web page that serves as the basis for partner meetings. The impact spans productivity (doing work of 30 FTEs), quality of advice, and coverage (serving more restaurants than would otherwise be possible).

**ISO for iFoods**: While this is more customer-facing, it's worth noting as it demonstrates agent capabilities. ISO is an agent supporting the entire shopping experience on WhatsApp or within the app, providing personalized recommendations and guidance from beginning to end based on understanding user intent.

**Data Analyst Agents**: These are described as among the most common but also the hardest to perfect. They democratize data access by allowing non-technical employees to query databases in natural language without writing SQL, providing independence and agility benefits.

## Integration Landscape

The platform integrates with a wide range of systems, categorized into several types:

- **Operational systems**: Email, calendar, and other standard business tools
- **Systems of record**: Core business databases and data warehouses
- **Help desk and support systems**: CRM and customer service platforms
- **Data platforms**: Specifically mentioned are Databricks and Tableau
- **Communication platforms**: WhatsApp is prominently featured for both customer-facing and internal workflows
- **Company-specific proprietary systems**: Many custom internal systems unique to Prosus's various business units

The speaker notes interesting potential applications when internal and external systems are mixed, but also acknowledges potential issues that arise from this combination, suggesting ongoing work in defining appropriate boundaries and safety measures.

The mention of MCPs (Model Context Protocols) indicates Prosus is adopting emerging standards for agent-system integration, though details of specific implementations aren't provided.

## Organizational Adoption and Change Management

One of the most valuable insights from this case study is the emphasis on organizational challenges over technical ones. The speaker explicitly states: "After a certain point AI adoption is never a technical problem is always an organizational problem." This realization came after observing that agent creation growth was linear rather than exponential through July 2024, despite continuously adding features, integrations, and MCPs.

**Key barriers identified:**

- **Perception that agent creation requires engineering skills**: Many employees assumed only software engineers could create agents, creating a psychological barrier to adoption.

**Interventions implemented:**

- **Demonstration and demystification**: The team worked directly with salespeople, HR staff, and other non-technical employees to create agents together, showing how simple and iterative the process could be. The goal was to "make agents uncool"—removing any special mystique and positioning them as ordinary tools anyone can use.

- **Upskilling at scale**: Comprehensive training programs covering features, capabilities, and best practices.

- **Competitions**: The "Prosus Got Talent" competition, modeled after Shark Tank, runs from late 2024 through March 2025. Teams across the organization compete to solve real business challenges using agents, competing for prizes and the opportunity to present at a final event. This generates visibility, recognition, and helps build organizational muscle for AI. Hundreds of teams participate, with monthly selections leading to finals.

- **Creating the mindset**: Helping employees develop the habit of thinking "I have this job to be done, and I can create an agent to solve it" requires facilitation and doesn't happen naturally.

The speaker emphasizes that this is fundamentally a bottom-up process of collective discovery. Rather than top-down mandates, the approach provides tools for everyone to experiment, positioning experimentation at scale as a competitive advantage.

## Evaluation Framework

When asked about evaluating 30,000 agents, the speaker candidly responds "one by one" before elaborating on their framework. The evaluation approach is still being developed as they go, and methods vary by use case:

**A/B Testing**: When possible, agents are evaluated by comparing outcomes with and without the agent, providing clear impact measurement.

**Before-and-After Analysis**: For cases where A/B testing isn't feasible, evaluation relies more on user feedback and comparing performance metrics before and after agent deployment.

**Four-Dimensional Impact Assessment**:
- **Productivity**: Traditional efficiency metrics, though what constitutes productivity varies by use case
- **Quality**: Measured differently depending on context (e.g., customer support grades, accuracy of analysis, comprehensiveness of reports)
- **Agility**: The ability to work outside one's comfort zone—developers using unfamiliar languages, marketing or legal professionals creating small applications for their own needs
- **Independence**: Reduced dependency on colleagues to complete work

The speaker's personal newsletter agent example illustrates nuanced evaluation—it doesn't save time that would have been spent anyway (no productivity gain), but provides value through coverage, independence, and personalization. This demonstrates that their evaluation framework extends beyond simple efficiency metrics.

## Domain Knowledge and Effectiveness

A critical success factor emphasized is surfacing and encoding implicit domain knowledge. The speaker notes that "these tools work when we can surface and encode domain knowledge." Much of the work involves making tacit knowledge explicit so agents can leverage it effectively. This suggests significant effort in knowledge engineering, documentation, and structuring organizational expertise in agent-accessible formats.

## Critical Assessment and Considerations

While the presentation is enthusiastic about results, several important caveats and challenges deserve attention:

**Hallucination concerns**: The speaker acknowledges that earlier "intern-level" assistants would hallucinate and weren't trustworthy, though the implication is that more sophisticated agents have addressed this. However, no specific technical measures for hallucination mitigation are detailed.

**Data analyst agent difficulties**: The admission that data analyst agents remain "one of the hardest to get right still after a long time" suggests ongoing challenges with accuracy, query generation, or result interpretation even after substantial development effort. This is a common challenge in natural language to SQL applications and indicates that not all use cases are equally mature.

**Evaluation methodology maturity**: The candid acknowledgment that evaluation frameworks are "developing as we go" and vary significantly by use case suggests that measurement remains a work in progress. The lack of standardized metrics across all 30,000 agents may make it difficult to assess overall program effectiveness or compare different agent types.

**Safety and data pollution risks**: While mentioned as a concern (particularly for intermediate and senior agents that can write to databases), specific safety measures, approval workflows, or rollback mechanisms aren't detailed.

**Adoption metrics interpretation**: The jump from 20,000 created agents to 8,000 weekly active agents suggests that many agents see limited use. This could indicate failed experiments, specialized use cases, or agents that don't provide sufficient value. More granularity on usage patterns would be valuable.

**External system interaction risks**: The speaker notes potential issues when mixing internal and external system access but doesn't elaborate. This likely relates to data leakage, compliance, security boundaries, and the challenges of maintaining control when agents operate across these boundaries.

**Cost effectiveness claims**: While cost effectiveness is cited as a reason for building internally, no specific cost comparisons or ROI calculations are provided. The development and maintenance costs of a custom platform serving 30,000+ agents must be substantial.

**Dependency and organizational risk**: Creating such heavy dependence on AI agents raises questions about resilience, handling failures, maintaining skills in the human workforce, and what happens when agents produce incorrect outputs that aren't caught.

## Production Scale Implications

Deploying and maintaining 30,000 agents presents significant LLMOps challenges that aren't fully explored in the presentation:

- **Version control and updates**: How are agent definitions versioned? How are updates rolled out across thousands of agents?
- **Monitoring and observability**: What systems track agent performance, errors, and usage patterns at this scale?
- **Model updates**: How does the organization handle underlying model changes that might affect agent behavior?
- **Credential and access management**: With thousands of agents accessing various systems, credential management and security become critical.
- **Cost management**: Operating costs for 30,000 agents must be substantial and require sophisticated tracking and optimization.
- **Governance**: Who approves agents? What review processes exist? How are conflicts between agents resolved?

The presentation positions Prosus at the forefront of large-scale agent deployment, and while many technical details remain unspecified, the organizational insights around democratization and change management are particularly valuable. The emphasis on bottom-up creation, removing barriers to adoption, and using competitions to drive engagement offers a playbook that other organizations can learn from. However, prospective adopters should recognize that the technical challenges of ensuring agent reliability, safety, and effectiveness at this scale require significant ongoing investment and iteration, as evidenced by challenges with data analyst agents and evolving evaluation frameworks.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.