Tiendanube/Nuvemshop, a Latin American e-commerce platform serving over 180,000 merchants, developed Lumi, an agentic copilot embedded directly into their merchant admin interface. The problem they addressed was enabling shop owners to manage their stores more efficiently through natural language interactions. Using LangChain as the foundational framework, they built Lumi to assist merchants with tasks like catalog editing and other store management operations. The copilot sits alongside store owners in the admin panel, providing real-time assistance as they work, though specific metrics on adoption or performance improvements were not detailed in the source material.
Tiendanube/Nuvemshop represents a significant e-commerce platform in Latin America, serving a substantial merchant base of over 180,000 shop owners. The company recently deployed Lumi, an agentic copilot designed to assist merchants directly within their store administration interface. This case study, while limited in technical detail from the source material available (which appears to be primarily a social media post linking to a longer article), provides insight into how a major e-commerce platform has integrated LLM-based assistance into production systems at scale.
The core use case centers on enabling merchants to interact with their stores through a conversational AI assistant that can perform actions on their behalf. According to the brief description, Lumi is positioned as sitting “next to the shop owner while they manage their store,” suggesting a co-pilot model rather than a fully autonomous agent. The system is described as capable of editing catalogs and performing other store management functions, indicating it has both read and write access to critical business data.
The development team chose LangChain as their foundational framework for building Lumi. This choice is noteworthy from an LLMOps perspective as LangChain provides several key capabilities relevant to production deployments of agentic systems. The framework offers abstractions for building multi-step reasoning chains, integrating tools and APIs, managing conversation memory, and orchestrating complex agent behaviors. For an e-commerce context where merchants might request actions spanning multiple systems (inventory management, product descriptions, pricing, order processing), LangChain’s ability to coordinate between different tools and maintain conversational context would be particularly valuable.
The selection of LangChain also suggests certain architectural decisions. LangChain typically integrates with various LLM providers, meaning Tiendanube/Nuvemshop likely evaluated different underlying language models (potentially including OpenAI’s GPT models, Anthropic’s Claude, or open-source alternatives) and may have the flexibility to switch between them. This provider-agnostic approach is a mature LLMOps practice that reduces vendor lock-in and allows for optimization based on cost, performance, and capability tradeoffs.
Describing Lumi as “agentic” rather than simply a chatbot indicates the system can take actions beyond just answering questions. The mention of catalog editing specifically suggests integration with Tiendanube/Nuvemshop’s product information management systems. From an LLMOps perspective, this requires careful design of several components:
First, the system needs reliable function calling or tool use capabilities to translate natural language merchant requests into specific API calls against the e-commerce platform’s backend. This likely involves defining schemas for various operations (adding products, updating prices, modifying descriptions, managing inventory) and ensuring the LLM can accurately map user intent to the correct function with appropriate parameters.
Second, embedding the copilot “inside the merchant admin” suggests significant frontend integration work. The system needs to maintain context about what the merchant is currently viewing or working on, potentially using that contextual information to disambiguate requests. For example, if a merchant is viewing a specific product and asks Lumi to “update the price to $29.99,” the system needs to understand which product is being referenced.
Third, serving 180,000+ merchants implies significant scalability requirements. The LLMOps infrastructure must handle concurrent requests from many users, manage per-user conversation state and context, and ensure reasonable response times even under load. This likely requires careful optimization of API calls to the underlying LLM provider, potentially including batching, caching of common responses, and intelligent management of token usage to control costs.
One critical aspect of deploying agentic systems in production, especially those with write access to business-critical data like product catalogs, is ensuring appropriate safety guardrails. While the source material doesn’t detail specific measures, a responsible implementation would need to address several concerns:
Validation of requested changes before execution is crucial. If a merchant asks to “discount all products by 90%,” the system should likely confirm such a dramatic action before proceeding. This requires implementing confirmation workflows for high-risk operations, possibly with different thresholds based on the potential business impact.
The system also needs robust error handling for cases where the LLM misinterprets requests or generates invalid API calls. In production environments, even small error rates can affect many users, so the system likely includes fallback behaviors, clear error messages, and potentially the ability to undo recent actions.
From a data privacy and security perspective, an agentic system embedded in the admin panel has access to sensitive merchant data. The LLMOps infrastructure must ensure that data sent to external LLM providers is appropriately handled, potentially requiring techniques like data anonymization, on-premises deployment of models, or careful provider selection based on data processing agreements.
Deploying an agentic copilot to 180,000+ merchants represents a significant rollout that would require comprehensive testing and evaluation strategies. While the source doesn’t detail these processes, a mature LLMOps approach would include:
Testing across diverse merchant use cases and languages (particularly relevant for a Latin American platform where merchants may interact in Spanish, Portuguese, or other languages). The system needs to handle domain-specific e-commerce terminology accurately across multiple languages.
Monitoring of agent success rates, including tracking whether requested actions are completed successfully, how often the system needs to ask for clarification, and when users abandon interactions. These metrics are essential for identifying areas where the agent struggles and needs improvement.
Continuous evaluation against a test suite of representative merchant requests, ensuring that updates to prompts, the underlying model, or system integrations don’t degrade performance on core use cases.
The statement that they “recently rolled out” Lumi suggests this is a relatively new production deployment. Best practices in LLMOps for such systems typically involve phased rollouts, potentially starting with a subset of merchants, gathering feedback, and iterating before full deployment. This approach allows teams to identify edge cases, refine the agent’s behavior based on real-world usage patterns, and scale infrastructure incrementally.
The integration point—embedding the copilot directly in the admin interface rather than offering it as a separate tool—represents a strategic UX decision that likely impacts adoption. By making the assistant contextually available where merchants already work, the barrier to usage is lower than requiring them to navigate to a separate interface.
While the announcement of Lumi represents an interesting application of agentic LLM systems in e-commerce, the available information is quite limited and comes from what appears to be a promotional context (a social media post about the product). Several important questions remain unanswered:
What specific business outcomes have been achieved? Metrics like time saved per merchant, increase in catalog completeness, or user satisfaction scores would provide concrete evidence of value. Without these, the case study is primarily descriptive rather than evaluative.
What was the specific scope of Lumi’s capabilities at launch? “Edits the catalog” could range from simple description improvements to complex multi-product operations. Understanding the breadth and depth of available actions is crucial for assessing the technical complexity of the implementation.
How do they handle the inherent unpredictability of LLM outputs in a business-critical context? What percentage of interactions require human intervention or correction? These reliability metrics are central to evaluating whether the system truly provides production-grade assistance.
What was the development investment required? Building, testing, and deploying such a system involves significant engineering resources, and understanding the cost-benefit tradeoff would be valuable for others considering similar implementations.
The choice of LangChain, while providing useful abstractions, also introduces dependencies and potential limitations. How does the team manage version updates, handle deprecations in the framework, and optimize performance when working through these abstraction layers?
Despite the limited technical detail available, this case represents an interesting data point in the evolution of LLMOps in e-commerce. The decision to embed AI assistance directly into merchant workflows, rather than treating it as an add-on feature, suggests thoughtful product design. The scale of deployment (180k+ merchants) indicates this isn’t a small pilot but a genuine production system that required serious infrastructure investment.
For the e-commerce industry, agentic assistants that can help merchants manage increasingly complex catalogs, pricing strategies, and inventory across multiple channels represent a compelling use case for LLM technology. The administrative burden on small and medium-sized merchants is substantial, and tools that can streamline routine tasks through natural language interaction have clear potential value.
From an LLMOps maturity perspective, successfully deploying an agentic system at this scale requires orchestrating multiple complex components: prompt engineering for reliable task completion, robust API integration with existing systems, conversation state management, safety and validation guardrails, monitoring and observability, and cost-effective infrastructure for serving many concurrent users. While we don’t have visibility into Tiendanube/Nuvemshop’s specific implementations of these components, the fact that they’ve reached production deployment suggests at least functional solutions to these challenges.
The case also highlights the growing trend of domain-specific AI assistants embedded in vertical software. Rather than general-purpose chatbots, we’re seeing more applications of LLMs fine-tuned or prompted for specific business contexts with deep integration into existing workflows and data systems. This represents a maturing understanding of how to create practical value with LLM technology beyond basic question-answering.
In conclusion, while the available information about Lumi is limited and primarily promotional in nature, the case study provides a glimpse into how mid-sized e-commerce platforms are leveraging modern LLMOps practices to deploy agentic systems at scale. The use of established frameworks like LangChain, integration directly into existing admin workflows, and deployment to a substantial merchant base all suggest a thoughtful approach to bringing LLM capabilities to production. However, without detailed technical specifications, performance metrics, or candid discussion of challenges encountered, this remains more of an announcement of capability than a deep technical case study. Future detailed write-ups from the Tiendanube/Nuvemshop team covering their architectural decisions, evaluation strategies, and lessons learned would provide valuable insights for the broader LLMOps community.
iFood developed iLo, a conversational AI agent that transforms how millions of users discover and order food through natural language interactions across multiple channels including WhatsApp, in-app chat, and voice. The system addresses the classic recommender challenge of hyper-personalization at scale by combining traditional machine learning techniques with LLMs to understand complex user preferences including price sensitivity, dietary restrictions, location preferences, and taste profiles. Early results show 16% faster order completion compared to traditional search and 35% higher conversion from search to cart addition, with the system currently serving approximately half a million users as part of iFood's "jet ski" innovation model for rapid experimentation.
Shopify's CTO discusses how the company has achieved near-universal AI adoption internally, with nearly 100% of employees using AI tools daily as of December 2025. The company has developed sophisticated internal platforms including Tangle (an ML experimentation framework), Tangent (an auto-research loop for automatic optimization), and SimGym (a customer simulation platform using historical data). These systems have enabled dramatic productivity improvements including 30% month-over-month PR merge growth, significant code quality improvements through critique loops, and the ability to run hundreds of automated experiments. The company provides unlimited token budgets to employees and emphasizes quality token usage over quantity, focusing on efficient agent architectures with critique loops rather than many parallel agents. They've also implemented Liquid AI models for low-latency applications, achieving 30-millisecond response times for search queries.
Etsy developed a gifting assistant agent to address challenges in searching through their unique, unstructured inventory of handcrafted and vintage items. The agent uses LangChain and LangGraph to enable conversational search, helping shoppers iteratively refine gift recommendations through natural dialogue. The team built the system with a focus on engineering reliability, evaluation rigor, and streamlined deployment, launching a beta version in production within six weeks with a small team of three senior engineers and one designer. Early results showed high-quality search results and relatively high purchase rates in the limited release.