Mercado Libre: AI-Powered Accessibility Automation for E-commerce Platform

Company

Mercado Libre

Title

AI-Powered Accessibility Automation for E-commerce Platform

Industry

E-commerce

Link

https://medium.com/mercadolibre-tech/how-we-are-using-ai-in-mercado-libres-accessibility-team-e960b83283a9

Year

2025

Summary (short)

Mercado Libre's accessibility team implemented multiple AI-driven initiatives to scale their support for hundreds of designers and developers working on accessibility improvements across the platform. The team deployed four main solutions: an A11Y assistant that provides real-time support in Slack channels using RAG-based LLMs consulting internal documentation; automated enrichment of accessibility audit tickets with contextual explanations and remediation guidance; a Figma handoff assistant that analyzes UI designs and recommends accessibility annotations; and an automated ticket review system integrating Jira and GitHub to assess fix quality. These initiatives aim to multiply the effectiveness of accessibility experts by automating routine tasks, providing immediate answers, and enabling teams to become more autonomous in addressing accessibility issues, while the core team focuses on strategic challenges.

Tags

high_stakes_application

multi_modality

regulatory_compliance

## Overview Mercado Libre, one of Latin America's largest e-commerce platforms, has implemented a comprehensive set of AI-driven initiatives within their accessibility team to address a fundamental scaling challenge. The team is responsible for supporting hundreds of designers and developers with accessibility questions, code reviews, and continuous improvements across the platform. Rather than positioning AI as a replacement for accessibility expertise, the case study frames it as a force multiplier that enables the small accessibility team to scale their impact while maintaining quality and fostering organizational learning around inclusive design practices. The initiatives described were published in October 2025 and represent an ongoing exploration of how LLMs can be integrated into accessibility workflows. The team explicitly acknowledges they are in an experimental phase, testing and discovering responsible uses of AI that benefit the entire organization. This case study is notable for its focus on internal tooling and developer experience rather than customer-facing AI features. ## Technical Architecture and Infrastructure Mercado Libre leverages their internal development ecosystem called Fury, which provides automation tools and infrastructure that underpin several of the AI initiatives. While the case study doesn't provide extensive architectural details, it's clear that the team has built custom integrations connecting LLMs with various internal systems including Slack, Jira, GitHub, internal documentation repositories, design systems, and training materials. The approach centers on creating specialized AI assistants and automated workflows tailored to specific accessibility tasks. Each initiative appears to be designed as a discrete system addressing a particular pain point in the accessibility workflow, though they likely share common infrastructure components through the Fury platform. ## Initiative 1: The A11Y Assistant for Everyday Support The first major initiative is an AI assistant that operates within the team's support channel, providing real-time accessibility guidance to developers and designers. This assistant activates when mentioned in the channel and can process both text messages and screen images, making it flexible enough to handle various types of accessibility questions. The technical implementation uses Retrieval-Augmented Generation (RAG), a critical architectural choice that addresses one of the primary concerns with production LLM systems: hallucinations and unreliable outputs. The RAG system queries multiple internal knowledge sources before generating responses, including internal documentation, training materials, historical accessibility queries, previously reported accessibility tickets, and the company's design system. By grounding responses in verified internal resources, the system significantly reduces the risk of providing incorrect or inappropriate guidance. The case study explicitly discusses the RAG pipeline, noting that the flow from query initiation through context gathering to final response delivery is designed to keep the assistant "focused on trusted resources." This ensures responses are "not only accurate but also directly applicable and aligned with Mercado Libre's internal accessibility standards and tools." This represents a thoughtful LLMOps approach that prioritizes reliability and organizational alignment over pure capability. From an operational perspective, this assistant serves as a first line of support, handling routine questions and freeing up the core accessibility team to focus on more complex issues. However, the team explicitly maintains their availability for situations requiring human expertise, suggesting a well-considered human-in-the-loop approach rather than full automation. ## Initiative 2: Automated Enrichment of Accessibility Tickets The second initiative addresses a key insight from the team's experience: technical accessibility reports, while comprehensive, can be overwhelming for developers who aren't deeply familiar with accessibility concepts. Manual accessibility audits typically include detailed information such as assistive technologies used, reproduction steps, and affected WCAG criteria, but this technical detail doesn't always translate into clear action items for developers. To bridge this gap, the team implemented an automated system that augments every generated accessibility ticket with AI-generated contextual notes covering three aspects. First, the system provides a technical impact explanation framed in user experience terms, moving beyond simply stating "this is wrong" to explaining why it matters from an end-user perspective. Second, it offers concrete suggestions and recommendations, including code examples or references drawn from the company's knowledge base and design patterns. Third, it provides verification guidance, explaining exactly how developers can test their fixes using specific tools and steps. The case study provides a concrete example: for a ticket about a missing "Skip to results" link in search functionality, the AI explained the impact on keyboard users, suggested correct code based on internal guidelines, and listed verification steps. This level of contextual enrichment significantly reduces the learning curve for developers while scaling the accessibility team's ability to provide guidance without direct involvement in every ticket. The initiative extends beyond manual audits to automated accessibility testing. The team uses Axe, a popular automated accessibility testing tool that detects errors through predefined rules. The AI system enriches these automatically detected issues with additional context, helping developers better understand problems directly in the HTML where they appear. This represents an interesting layering of automation tools, where traditional rule-based testing is enhanced with LLM-generated explanations tailored to the specific context. ## Initiative 3: Accessibility Notes Assistant for Design Handoffs Recognizing that accessibility must be integrated early in the design phase, the team created an assistant specifically for UX teams working in Figma. This tool analyzes screen images from either desktop web or native mobile platforms and generates recommendations for accessibility annotations during the design handoff process. The technical approach involves the assistant analyzing visual context and platform type to create a "descriptive visual map" of screen elements. It then recommends annotations based on the company's established accessibility notes for Figma, focusing on semantic specifications like headers, links, buttons, groups, and dynamic areas rather than technical implementation details. The prompt engineering for this assistant emphasizes simplicity and clarity, helping designers understand how to annotate components without requiring deep technical knowledge. Currently, the system generates text-based responses rather than image annotations, a deliberate choice based on accuracy considerations—the team notes that text responses are "far more accurate than image-based annotations," suggesting they've experimented with both approaches and made a pragmatic decision based on performance. This initiative represents a "shift left" approach to accessibility, catching potential issues during design rather than after implementation. By empowering UX teams to incorporate accessibility specifications independently, the tool reduces dependencies on the core accessibility team while improving the completeness of design deliveries. ## Initiative 4: Automated Review of Resolved Accessibility Tickets The fourth major initiative tackles the operational challenge of validating accessibility fixes at scale. The accessibility team conducts weekly reviews of resolved tickets to verify that fixes are implemented correctly and to learn from applied solutions. However, manually reviewing every resolved ticket is time-intensive and doesn't scale well. To address this, the team developed an automation workflow that integrates Jira, GitHub, and an AI agent. The system pulls all accessibility tickets resolved each day, breaks down each ticket into key information, and instructs the AI agent to perform a comprehensive analysis. The agent examines ticket comments and solution evidence, identifies and reviews linked GitHub pull requests including their technical content, and assesses the clarity, relevance, and documentation quality of the solution. Based on this analysis, the system classifies each ticket using a traffic-light system (green, yellow, or red emojis) indicating fix quality, and stores results in a shared spreadsheet for team review. This classification system allows the human accessibility team to prioritize which tickets need manual review, focusing their expertise where it's most needed. The workflow represents sophisticated LLMOps implementation, requiring the AI agent to understand both the accessibility domain context from Jira tickets and the technical implementation details from GitHub code reviews. The system must correlate information across multiple systems and make quality assessments that inform human decision-making. The case study notes this enables the team to "automatically flag tickets lacking sufficient technical evidence" and "objectively classify fix reliability," though the term "objectively" should be interpreted cautiously—the system's classifications are ultimately based on patterns learned by the LLM and the logic encoded in the prompts. The automated review system also enables the team to request targeted information from development teams when evidence is insufficient, and centralizes analysis in a dashboard format that supports team coordination and knowledge sharing. ## LLMOps Considerations and Operational Maturity Throughout the case study, several LLMOps best practices and considerations emerge, though some are implicit rather than explicitly discussed. **RAG as a Hallucination Mitigation Strategy**: The explicit use of RAG in the A11Y assistant demonstrates awareness of LLM limitations in production environments. By grounding responses in verified internal documentation and historical data, the team reduces the risk of the system providing incorrect guidance that could lead to accessibility compliance issues or poor user experiences. This is particularly important given the domain—accessibility errors can have significant legal and ethical implications. **Context-Specific Prompt Engineering**: Each assistant appears to use carefully crafted prompts tailored to specific tasks. The design handoff assistant's prompt focuses on "explaining, in simple and clear terms" how to annotate components, suggesting deliberate prompt engineering to match the audience's expertise level. The automated review system's prompts must balance technical code analysis with accessibility domain knowledge. **Integration Architecture**: The initiatives demonstrate mature integration patterns, connecting LLMs with existing organizational tools (Slack, Jira, GitHub, Figma) rather than requiring users to adopt new platforms. This integration-first approach reduces adoption friction and embeds AI capabilities directly into existing workflows. **Human-in-the-Loop Design**: Despite the automation, the case study consistently emphasizes that the accessibility team remains available and involved. The A11Y assistant serves as first-line support but doesn't replace human experts. The ticket review automation filters and prioritizes for human review rather than making final decisions autonomously. This represents a mature understanding of where AI adds value versus where human judgment remains essential. **Quality and Accuracy Trade-offs**: The decision to use text-based responses rather than image annotations in the design assistant shows pragmatic evaluation of model capabilities and willingness to constrain functionality based on accuracy requirements. This suggests the team is actively testing and evaluating outputs rather than assuming capability. **Evaluation and Monitoring**: While not extensively detailed, the case study mentions that responses are "far more accurate" in certain formats, implying the team has mechanisms for assessing accuracy. The traffic-light classification system for ticket reviews provides a structured output that could be validated against human assessments, though the case study doesn't describe formal evaluation metrics or monitoring dashboards. ## Organizational and Cultural Aspects Beyond the technical implementation, the case study reveals important organizational aspects of deploying LLMs in production. The team frames their work as "exploring" and "experimenting" with AI, acknowledging they're in a discovery phase rather than claiming to have solved all problems. This humility is appropriate given the relative newness of production LLM applications. The explicit goal of fostering "collective learning" and enabling teams to become "more autonomous in solving accessibility issues" suggests the AI initiatives are designed not just for efficiency but for capability building across the organization. Rather than centralizing accessibility knowledge solely within the specialist team, the tools democratize access to that knowledge while maintaining quality through RAG-based grounding in verified resources. The team acknowledges specific contributors leading AI and accessibility explorations at Mercado Libre, indicating this is a collaborative effort rather than a single-person initiative. The call for others in tech to "experiment with AI and accessibility in your processes" and "share your learnings" reflects a community-oriented mindset about advancing the field. ## Limitations and Considerations While the case study presents these initiatives positively, several considerations and limitations warrant mention from a balanced LLMOps perspective. **Limited Technical Detail**: The case study is relatively light on technical specifics such as which LLM models are used, how RAG retrieval is implemented, what vector databases or search technologies power the knowledge retrieval, how prompts are versioned and managed, or what evaluation metrics are tracked. This makes it difficult to assess the full maturity of the LLMOps practices or to replicate the approach. **Evaluation Methodology**: While accuracy is mentioned, the case study doesn't describe formal evaluation processes, benchmarks, or how the team measures whether AI-generated guidance actually improves accessibility outcomes or developer productivity. The traffic-light system for ticket classification provides structure but doesn't indicate how classification accuracy is validated. **Scalability and Cost**: No information is provided about the operational costs of running these AI systems, latency considerations, or how the systems scale with increasing usage. For a platform supporting hundreds of developers, query volume could be substantial, and understanding cost-performance trade-offs would be valuable. **Potential Risks**: The case study doesn't discuss potential failure modes, such as what happens when the RAG system retrieves irrelevant context, how the team handles cases where AI provides subtly incorrect guidance that passes initial review, or how they prevent the system from perpetuating biases or outdated practices that might exist in historical data. **Change Management**: While the tools are described, there's limited discussion of adoption challenges, user training requirements, or resistance from teams who might be skeptical of AI-generated accessibility guidance. The human factors of deploying these systems aren't extensively covered. **Dependency Risks**: Building multiple systems that depend on external LLM providers (though the provider isn't specified) creates dependency risks around API changes, pricing changes, or service availability that aren't addressed in the case study. ## Strategic Direction and Future Work The case study concludes by noting the team's goal to shift accessibility "left" in the development lifecycle, preventing issues before they arise and integrating more seamlessly into design and development workflows. The mention of "emerging paradigms like Vibe Coding" (though not explained) suggests the team is exploring cutting-edge development approaches. The emphasis on the Fury platform "playing a key role in scaling, securing, and centralizing these AI agents" indicates infrastructure investment in supporting multiple AI initiatives across the organization, not just in accessibility. This suggests Mercado Libre may be developing broader LLMOps capabilities that the accessibility team is leveraging. ## Conclusion and Assessment This case study represents a thoughtful exploration of LLMs in production for a specific organizational challenge: scaling accessibility expertise across a large e-commerce platform. The initiatives demonstrate several LLMOps strengths including RAG-based grounding to reduce hallucinations, integration with existing tools to minimize friction, human-in-the-loop design to maintain quality, and task-specific prompt engineering to match user needs. However, the case study should be read as a progress report on ongoing experimentation rather than a proven, fully mature LLMOps implementation. The limited technical detail, absence of quantitative results or evaluation metrics, and lack of discussion around challenges or failures suggest this is promotional content meant to showcase innovation rather than a rigorous technical analysis. From a balanced perspective, the initiatives appear promising and demonstrate good instincts around where LLMs can add value (automating routine guidance, enriching technical information with context, supporting early-stage design decisions, filtering review queues). The emphasis on augmenting rather than replacing human expertise is appropriate for a domain like accessibility where errors can have significant real-world consequences. The use of RAG to ground responses in verified internal resources addresses a critical concern for production LLM systems. For organizations considering similar initiatives, this case study offers useful patterns around task-specific AI assistants, ticket enrichment workflows, and integration approaches, but should be supplemented with more rigorous technical research on RAG implementation, evaluation methodologies, and operational considerations for production LLM systems. The accessibility domain context is particularly interesting as it demonstrates LLM applications beyond the common use cases of customer service or content generation, showing how AI can support internal operational excellence and capability building.

Start deploying reproducible AI workflows today