ZenML

Agent-Based AI Assistants for Enterprise and E-commerce Applications

Prosus 2024
View original source

Prosus developed two major AI agent applications: Toan, an internal enterprise AI assistant used by 15,000+ employees across 24 companies, and OLX Magic, an e-commerce assistant that enhances product discovery. Toan achieved significant reduction in hallucinations (from 10% to 1%) through agent-based architecture, while saving users approximately 50 minutes per day. OLX Magic transformed the traditional e-commerce experience by incorporating generative AI features for smarter product search and comparison.

Industry

E-commerce

Technologies

Overview

Prosus is a global technology group that operates e-commerce platforms across approximately 100 countries, connecting buyers and sellers across diverse verticals including food delivery, groceries, electronics, fashion, real estate, and automotive. With an existing ecosystem of about 1,000 AI experts and several hundred ML models already in production by 2018, the company embarked on an ambitious journey to deploy LLM-based agents across their organization starting in 2022, well before the public release of ChatGPT.

This case study covers two distinct but related agent deployments: Tokan, an internal AI assistant for employees, and OLX Magic, a consumer-facing e-commerce discovery experience. Both systems share underlying technology and learnings, providing valuable insights into deploying LLMs at enterprise scale.

Tokan: The Enterprise AI Assistant

Philosophy and Approach

Prosus took an unconventional approach to enterprise AI adoption. Rather than dictating specific use cases from the top down, they pursued what they call a “collective discovery process.” The reasoning was straightforward: during early large-scale field tests of GPT-3 around 2020, they observed that users were discovering solutions and use cases that the organization hadn’t anticipated. This led them to conclude that the best strategy was to provide the best possible tools to everyone and let them figure out valuable applications themselves.

Technical Architecture

Tokan is designed as an LLM-agnostic platform that integrates with multiple large language models. The architecture consists of several key components:

The system connects to internal databases and data lakes, enabling use cases like natural language data exploration where users can query databases in English without knowing SQL. The Data Explorer agent, for example, identifies relevant tables and columns, generates queries, executes them, handles errors through self-correction, and can even produce visualizations.

Evolution and Hallucination Reduction

One of the most interesting aspects of this deployment is the transparent tracking of hallucination rates. Prosus implemented a feedback mechanism with four reaction options: thumbs up, thumbs down, love, and notably “Pinocchio” - specifically designed to signal when the AI was making things up.

The hallucination journey shows significant improvement:

This reduction came from three factors: improved underlying models, users becoming more skilled at using the tools (avoiding problematic use cases), and crucially, the introduction of agents in January 2024. The agentic architecture with its tool-calling capabilities and reflection loops appears to have been a major contributor to reliability improvements.

Cost Economics

The cost analysis presented is particularly valuable for practitioners. Token costs dropped approximately 98% over the development period, following the general trend in the LLM market. However, the full cost picture is more nuanced:

Between May and August 2024 (just four months):

This dynamic illustrates a key challenge in LLMOps: while individual token costs decrease, agentic systems consume more tokens per interaction, partially offsetting savings. Prosus has experimented with various optimization strategies including model switching (using appropriate models for different tasks), prompt caching, reserved instances, and their own GPUs.

Scale and Adoption

Current metrics indicate substantial enterprise adoption:

Interestingly, usage patterns shifted over time. Initially, about 50% of usage was engineering-focused (developers, product managers). After introducing agents, non-engineering usage increased significantly, suggesting the improved reliability and capability made the tool more accessible to broader user groups. The current split (41% technical, 59% non-technical) roughly mirrors the overall company composition.

Impact Measurement

Rather than relying on subjective surveys, Prosus trained a model to estimate time saved per interaction. This model was trained on comparison activities to understand the difference between working with and without the assistant for various task types.

Key findings:

However, Prosus emphasizes that time saved isn’t the primary value. User feedback highlights other benefits: increased independence (not needing to tap colleagues for help), ability to work outside comfort zones (e.g., coding in unfamiliar languages), and eliminating “writer’s block” to get started on tasks. The aggregate effect is described as making “the entire organization a bit more senior.”

iFood Case Study: Organizational Change

The iFood implementation (Brazil’s largest food delivery company in the Prosus portfolio) provides a compelling example of how to maximize value from AI assistants. By giving everyone access to data querying in natural language:

Critically, Prosus learned that the real value comes not from making data analysts faster, but from enabling the entire organization to query data directly, reducing dependency on specialized roles. This required organizational change, not just tool deployment.

OLX Magic: Consumer-Facing E-commerce Agent

Product Evolution

OLX Magic represents the application of the same agentic technology to consumer-facing e-commerce. OLX is a large online marketplace for classified goods (secondhand items, real estate, cars, jobs). The system is currently live in Poland.

An important product lesson emerged: the initial approach of building a “ChatGPT for e-commerce” with a blank conversational canvas failed. Users found this interface too unfamiliar for commercial settings. The team iterated to an experience that maintains familiar e-commerce patterns (search bars, results grids, filters) while infusing AI-powered features throughout.

Gen-AI Powered Features

The experience includes several LLM-powered capabilities:

Technical Architecture

OLX Magic uses a similar agentic framework to Tokan but specialized for e-commerce:

Key Learnings

The presentation highlighted three crucial learnings for e-commerce agents:

The first learning concerns user experience: conversational interfaces alone don’t work for e-commerce. Users need familiar patterns enhanced with AI, not replaced by AI.

The second learning is about domain-specific agent design: generic agent frameworks aren’t sufficient. The tools, prompts, evaluations, memory systems, and guardrails all need to be specifically designed for the e-commerce journey. Personalization, product understanding, and purchase-appropriate interactions require specialized development.

The third learning relates to search infrastructure: “Your gen-AI assistant is only as good as your search.” The existing search pipelines weren’t suitable for the OLX Magic experience. The team had to rebuild from the ground up, including creating embeddings for listings based on titles, images, and descriptions, and building retrieval pipelines that could work with LLM-generated queries.

Results

While specific metrics weren’t disclosed (the project is still in development), Prosus indicated they’re measuring standard e-commerce KPIs (transactions, engagement) and seeing positive results. Their stated philosophy is “make it work, then make it fast, then make it cheap.”

Cross-Cutting LLMOps Insights

Several themes emerge across both deployments that are relevant for LLMOps practitioners:

Continuous Learning Infrastructure: The insights model within Tokan that categorizes usage patterns represents an important pattern - building observability not just for debugging but for product development.

Feedback Mechanisms: The “Pinocchio” button for hallucination reporting demonstrates the value of specific, actionable feedback mechanisms beyond simple thumbs up/down.

Cost Awareness: The detailed tracking of cost per interaction, cost per token, and tokens per question shows mature cost management thinking essential for sustainable production LLM deployments.

Model Flexibility: Both systems are described as LLM-agnostic, using multiple models and routing to appropriate ones based on task requirements. This flexibility appears important for both cost optimization and capability matching.

Organizational Change: The iFood example emphasizes that technical deployment alone isn’t sufficient - organizational processes must change to capture full value from AI assistants.

Iterative Product Development: Both products evolved significantly from initial versions, suggesting that LLM-based products require significant iteration informed by real user behavior.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Large-Scale Personalization and Product Knowledge Graph Enhancement Through LLM Integration

DoorDash 2025

DoorDash faced challenges in scaling personalization and maintaining product catalogs as they expanded beyond restaurants into new verticals like grocery, retail, and convenience stores, dealing with millions of SKUs and cold-start scenarios for new customers and products. They implemented a layered approach combining traditional machine learning with fine-tuned LLMs, RAG systems, and LLM agents to automate product knowledge graph construction, enable contextual personalization, and provide recommendations even without historical user interaction data. The solution resulted in faster, more cost-effective catalog processing, improved personalization for cold-start scenarios, and the foundation for future agentic shopping experiences that can adapt to real-time contexts like emergency situations.

customer_support question_answering classification +64

Multi-Agent Financial Research and Question Answering System

Yahoo! Finance 2025

Yahoo! Finance built a production-scale financial question answering system using multi-agent architecture to address the information asymmetry between retail and institutional investors. The system leverages Amazon Bedrock Agent Core and employs a supervisor-subagent pattern where specialized agents handle structured data (stock prices, financials), unstructured data (SEC filings, news), and various APIs. The solution processes heterogeneous financial data from multiple sources, handles temporal complexities of fiscal years, and maintains context across sessions. Through a hybrid evaluation approach combining human and AI judges, the system achieves strong accuracy and coverage metrics while processing queries in 5-50 seconds at costs of 2-5 cents per query, demonstrating production viability at scale with support for 100+ concurrent users.

question_answering data_analysis chatbot +49