ZenML

Building a Scalable LLM Gateway for E-commerce Recommendations

Mercado Libre 2023
View original source

Mercado Libre developed a centralized LLM gateway to handle large-scale generative AI deployments across their organization. The gateway manages multiple LLM providers, handles security, monitoring, and billing, while supporting 50,000+ employees. A key implementation was a product recommendation system that uses LLMs to generate personalized recommendations based on user interactions, supporting multiple languages across Latin America.

Industry

E-commerce

Technologies

Overview

Mercado Libre, Latin America’s largest e-commerce and fintech company with a mission to democratize commerce and financial services, embarked on an ambitious initiative to make generative AI available at enterprise scale. With over 50,000 employees—many without technical backgrounds—the company faced significant challenges in rolling out LLM capabilities across the organization. The solution presented by Lina Chaparro, a Machine Learning Project Leader at Mercado Libre, centers on the development of a centralized LLM Gateway that provides management, monitoring, and control for consuming generative AI services.

The presentation was delivered as part of what appears to be a technical conference or meetup, where the speaker walked through both the architectural decisions and a specific production use case involving product recommendations and push notifications.

The Challenge of Enterprise-Scale LLM Adoption

The presentation highlights several critical challenges that organizations face when attempting to deploy generative AI at scale:

Scale and Rate Limiting: Operating at Mercado Libre’s scale means handling massive volumes of requests per minute, often exceeding the rate limits of underlying LLM providers. This is a common challenge in production LLM deployments where API quotas from providers like OpenAI, Anthropic, or Google can quickly become bottlenecks.

Democratization Across Non-Technical Users: With 50,000+ employees who don’t necessarily have programming skills, the company needed to find ways to make LLM capabilities accessible to everyone, not just engineers. This speaks to the broader challenge of building user-friendly interfaces and tools that abstract away technical complexity.

Rapidly Evolving Model Landscape: The LLM market changes rapidly, with new models and providers emerging constantly. Organizations need infrastructure that can adapt to these changes without requiring significant rearchitecting.

Observability and Trust: Understanding how LLMs are being used, monitoring quality of responses, tracking response times, and managing costs are all critical for enterprise adoption. The speaker emphasized the need for metrics visible at different levels to understand impact.

Security and Information Protection: As a company handling sensitive financial and commerce data, information security was highlighted as a top priority.

The Gateway Architecture Solution

Mercado Libre’s solution was to build an LLM Gateway—a centralized system that acts as a single entry point for all generative AI consumption across the organization. The implementation was accelerated by leveraging Fury, the company’s internal platform, which allowed rapid scaling without infrastructure concerns.

The gateway architecture provides several key benefits:

Centralized Management and Control: In a complex ecosystem like Mercado Libre’s, having a single point of control for LLM communications is essential. This allows for consistent policies, monitoring, and governance across all use cases.

Multi-Provider Integration: The gateway currently integrates four major LLM providers, providing a unified interface regardless of which underlying model is being used. This abstraction layer is crucial for avoiding vendor lock-in and enabling experimentation with different models.

Fallback System: The gateway implements logic to guarantee continued service availability. If one provider experiences issues, the system can automatically route requests to alternative providers, ensuring high availability for production workloads.

Security Layer: The gateway acts as a security barrier, providing encryption, authentication, and authorization functions. This is particularly important when dealing with sensitive customer data in an e-commerce and financial services context.

Performance Optimization: The gateway handles intelligent routing and rate limiting to reduce latency and improve response times. This is essential for real-time applications where user experience depends on fast responses.

Simplified Architecture: By acting as a single entry point, the gateway reduces complexity and direct dependencies between systems, making the overall architecture easier to maintain and scale.

Centralized Billing and Cost Management: Given the different pricing models across providers and models, the gateway centralizes consumption tracking and associated costs, providing visibility into GenAI spend.

Developer and User Tools

Beyond the gateway itself, Mercado Libre built complementary tools to facilitate adoption:

Playground: The team developed an internal playground tool that centralizes GenAI-based solutions accessible to any employee. By 2023, this playground had more than 16,000 unique users, demonstrating significant internal adoption.

SDK: For developers building programmatic integrations, an SDK was developed to simplify the user experience and abstract away the complexity of interacting with the gateway.

Production Use Case: Personalized Product Recommendations

The presentation includes a detailed production use case that demonstrates the practical application of the gateway architecture:

Problem Statement: The goal was to drive recommendations for products related to user interests, enhancing the user experience by creating customized bookings with high-shipping products and improving the notification algorithm to increase engagement in the marketplace.

How It Works: When a user interacts with a product—through views, questions, or favorites—the system uses LLMs to generate personalized push notifications encouraging purchase. These notifications lead to landing pages with AI-generated recommendations. The LLM essentially answers the question “what are the top 10 items this user would be interested in buying based on their preferences?”

Technical Challenges Addressed:

Results and Scale

The presentation provides some concrete metrics around adoption and impact:

It’s worth noting that some of these results are described in terms of expectations rather than measured outcomes, so the full impact may not yet be quantified.

MLOps Integration

The speaker explicitly mentions that the gateway architecture aligns with Mercado Libre’s internal MLOps practices, providing tools that improve quality, performance, scalability, and security of model use. This suggests the LLM gateway is part of a broader ML infrastructure strategy rather than a standalone initiative.

Critical Assessment

The presentation makes a compelling case for the gateway architecture approach to enterprise LLM deployment. However, a few observations are worth noting:

The metrics shared (16,000 users, 150 use cases) demonstrate adoption but don’t provide deep insight into business impact or ROI. The expected improvements in NPS are mentioned but appear to still be in the measurement phase.

The multi-provider integration with fallback capabilities is a strong architectural choice that addresses availability concerns and reduces vendor dependency, though the specific providers aren’t named.

The emphasis on democratizing AI access to non-technical employees through the playground is commendable, though the presentation doesn’t detail what safeguards are in place to prevent misuse or ensure appropriate use of the technology.

Overall, this case study represents a thoughtful, infrastructure-first approach to enterprise LLM adoption that addresses many of the practical challenges organizations face when moving from experimentation to production-scale deployment.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building Economic Infrastructure for AI with Foundation Models and Agentic Commerce

Stripe 2025

Stripe, processing approximately 1.3% of global GDP, has evolved from traditional ML-based fraud detection to deploying transformer-based foundation models for payments that process every transaction in under 100ms. The company built a domain-specific foundation model treating charges as tokens and behavior sequences as context windows, ingesting tens of billions of transactions to power fraud detection, improving card-testing detection from 59% to 97% accuracy for large merchants. Stripe also launched the Agentic Commerce Protocol (ACP) jointly with OpenAI to standardize how agents discover and purchase from merchant catalogs, complemented by internal AI adoption reaching 8,500 employees daily using LLM tools, with 65-70% of engineers using AI coding assistants and achieving significant productivity gains like reducing payment method integrations from 2 months to 2 weeks.

fraud_detection chatbot code_generation +57

Building Enterprise-Grade GenAI Platform with Multi-Cloud Architecture

Coinbase 2024

Coinbase developed CB-GPT, an enterprise GenAI platform, to address the challenges of deploying LLMs at scale across their organization. Initially focused on optimizing cost versus accuracy, they discovered that enterprise-grade LLM deployment requires solving for latency, availability, trust and safety, and adaptability to the rapidly evolving LLM landscape. Their solution was a multi-cloud, multi-LLM platform that provides unified access to models across AWS Bedrock, GCP VertexAI, and Azure, with built-in RAG capabilities, guardrails, semantic caching, and both API and no-code interfaces. The platform now serves dozens of internal use cases and powers customer-facing applications including a conversational chatbot launched in June 2024 serving all US consumers.

customer_support chatbot question_answering +36