Faire: AI-Powered Developer Productivity and Product Discovery at Wholesale Marketplace

Company

Faire

Title

AI-Powered Developer Productivity and Product Discovery at Wholesale Marketplace

Industry

E-commerce

Link

https://craft.faire.com/transforming-wholesale-with-ai-the-sequel-now-with-more-agents-9542f257dd45

Year

2025

Summary (short)

Faire, a wholesale marketplace connecting brands and retailers, implemented multiple AI initiatives across their engineering organization to enhance both internal developer productivity and external customer-facing features. The company deployed agentic development workflows using GitHub Copilot and custom orchestration systems to automate repetitive coding tasks, introduced natural-language and image-based search capabilities for retailers seeking products, and built a hybrid Python-Kotlin architecture to support multi-step AI agents that compose purchasing recommendations. These efforts aimed to reduce manual workflows, accelerate product discovery, and deliver more personalized experiences for their wholesale marketplace customers.

## Overview Faire operates a wholesale marketplace platform that connects independent retailers with brands and manufacturers. Their mission centers on helping retailers discover unique products that fit their store's identity while enabling brands to reach appropriate retail partners. In September 2025, Faire presented a comprehensive overview of their AI initiatives during Waterloo Tech Week, showcasing how they've integrated LLMs into both their internal engineering workflows and customer-facing product features. The company emphasizes using AI as "one tool in the toolkit" rather than AI for its own sake, focusing on solving tractable customer problems. The presentation covered two main thrust areas: internal developer productivity enhancements through agentic coding assistants, and external product improvements centered on search and discovery for their retail customers. What's notable about Faire's approach is their willingness to experiment with multiple architectural patterns simultaneously—from programmatic invocation of existing tools like GitHub Copilot to building custom agentic orchestration systems from scratch. ## Internal Developer Productivity: Agentic Development Systems Faire has invested significantly in AI-augmented engineering workflows, with the core philosophy that AI augments rather than replaces engineers. Their approach involves delegating well-scoped, repetitive tasks to coding agents while keeping engineers focused on complex problem-solving and architectural decisions. ### GitHub Copilot Integration and Programmatic Invocation Luke Bjerring demonstrated how Faire programmatically invokes GitHub Copilot to handle background coding tasks. Rather than simply using Copilot as an inline code completion tool, they've built workflows where agents receive specific instructions and context, then autonomously create pull requests, write tests, and perform refactoring operations. The system integrates with MCP (Model Context Protocol) servers to connect agents with internal Faire systems, providing them with company-specific context and capabilities. One concrete example showcased was an automated "settings cleanup" workflow. A Slack bot triggers agents to identify expired feature flags in the codebase, verify that these flags are safe to remove, and automatically open GitHub issues documenting the cleanup work needed. Notifications flow back through Slack to keep engineers informed. This type of maintenance work—essential but tedious—represents an ideal use case for agentic automation, as it follows predictable patterns and has clear success criteria. The approach reflects a pragmatic LLMOps philosophy: start with existing, well-supported tools (GitHub Copilot) but extend them programmatically to fit into company-specific workflows. This reduces the burden of building and maintaining entirely custom LLM infrastructure while still achieving significant automation benefits. ### Project Cyberpunk: Custom Agentic Development Stack Beyond adapting existing tools, Faire built "Project Cyberpunk," an in-house agentic development platform designed specifically for their engineering environment. George Jacob presented the architecture, which consists of several key components working together. The system features pre-warmed development workspaces that reduce cold-start latency when agents begin work. A lightweight orchestrator sits at the center, breaking down complex engineering tasks into smaller, more reliable prompts that can be distributed to specialized sub-agents. This decomposition strategy addresses a common challenge in agentic systems: LLMs perform better on focused, well-defined subtasks than on large, open-ended objectives. Faire maintains a library of focused sub-agents, each designed for specific types of work. Examples include agents specialized in settings cleanup (removing obsolete configuration), test authoring (generating unit and integration tests), and various other code-quality tasks. This modular approach allows the system to evolve incrementally—new agent capabilities can be added without restructuring the entire platform. Early adoption metrics showed steady usage across the engineering organization, with meaningful offload of repetitive work from human engineers to automated agents. The presentation noted that engineers are finding value in delegating grunt work while maintaining control over architectural decisions and complex logic. From an LLMOps perspective, Project Cyberpunk represents a more ambitious investment than the GitHub Copilot integration. It requires ongoing maintenance of orchestration logic, prompt engineering for each sub-agent, workspace management infrastructure, and careful monitoring to ensure agent outputs meet quality standards. However, it also provides greater flexibility to customize agent behavior for Faire's specific codebase, conventions, and workflows. ### Mobile Development and Prompt Engineering Practices Megan Bosiljevac shared practical examples of AI assistance in Android development, illustrating both quick wins and more substantial productivity gains. One case involved using an AI agent to fix color accessibility issues in the UI—a straightforward but time-consuming task that the agent handled effectively with appropriate prompting. A more impressive example involved scaffolding an entire Server-Driven UI screen. The agent generated approximately 800 lines of code in a matter of hours, a task that would typically take a human engineer multiple days. The presentation emphasized that this wasn't fully automated—the engineer provided specifications, reviewed the generated code, and made adjustments—but the time savings were substantial. Bosiljevac stressed that "prompting is a skill" and that effective use of AI coding assistants requires practice and iteration. Engineers need to learn how to break down tasks, provide sufficient context, and evaluate agent outputs critically. The presentation positioned AI-augmented development as a skill multiplier rather than a replacement, with humans remaining accountable for code quality and design decisions. This aligns with mature LLMOps thinking: acknowledging that LLMs require careful human oversight, that prompt engineering is a learned competency, and that successful deployments combine AI capabilities with human judgment rather than attempting full automation. ## Customer-Facing AI: Search and Discovery On the product side, Faire has deployed AI to transform how retailers discover products on their marketplace. The wholesale buying process traditionally involves significant friction—buyers must guess appropriate keywords, navigate complex filter taxonomies, and manually browse through large catalogs to find items that match their store's aesthetic and business model. ### Natural-Language Search Tom Ma presented Faire's natural-language search capability, which allows retailers to express their needs conversationally rather than constructing precise keyword queries. A buyer can input something like "Find me dresses made in Paris, under $100, and not sold on Amazon," and the system parses this phrase into structured search filters behind the scenes. This represents a classic application of LLMs for intent understanding and query structuring. The system presumably uses an LLM to extract entities (product type: dresses), locations (made in Paris), price constraints (under $100), and exclusion criteria (not on Amazon), then maps these to the underlying search service's filter parameters. From the user's perspective, they simply type what they want; from the system's perspective, it's executing a structured database query. The LLMOps considerations here include ensuring the intent parser reliably extracts the correct filters, handling ambiguous or incomplete queries gracefully, and monitoring for cases where the natural-language interpretation diverges from user intent. Additionally, the system needs to handle edge cases where users request filters that don't exist in the platform's data model. The presentation didn't detail the specific LLM implementation (whether using a general-purpose model with prompting or a fine-tuned classifier), but noted that this capability reduces the cognitive load on retailers and improves discovery efficiency. ### Image-Based Search Faire also implemented image-to-text product search, allowing retailers to upload a photo and find visually similar products in the marketplace. The example shown featured uploading an image of Snorlax (a Pokémon character) and retrieving relevant merchandise from the catalog. The architecture diagram reveals a straightforward pipeline: the client sends an image, an AI service converts the image to a text query (likely using a vision-language model to generate descriptive text or tags), this text query gets forwarded to Faire's existing text-based search infrastructure, and results return to the client. This design cleverly leverages existing search technology while adding a new input modality, rather than requiring an entirely separate image search system. From an LLMOps standpoint, this approach has several advantages. By converting images to text queries and then using the existing search stack, Faire avoids maintaining parallel search infrastructures. The vision model serves as an adapter layer, translating visual input into a format the rest of the system already understands. This is particularly valuable on mobile devices, where retailers might snap a photo of a competitor's product or a customer's request and immediately search for wholesale suppliers. The challenge lies in ensuring the image-to-text conversion produces queries that match user intent. A photo of a blue ceramic mug might generate the query "blue mug," but if the retailer actually cares about the specific glaze technique or artisan style, the simplified text query might not capture those nuances. Effective deployment requires iteration on the vision model's prompting or fine-tuning, as well as feedback mechanisms to understand when image search is failing to meet user needs. ## Architectural Evolution: Project Burrito Perhaps the most technically interesting revelation was "Project Burrito," Faire's hybrid architecture that pairs Python-based AI agents with their existing Kotlin macroservice infrastructure. Luan Nico presented this as a solution to a common challenge: how to integrate modern AI agent frameworks (predominantly Python-based) with established production systems built in other languages. The architecture places a Python sidecar alongside Kotlin services, with nginx routing traffic appropriately. The Python component handles agent orchestration and LLM interactions, while the Kotlin service provides access to existing business logic, data access layers, and internal APIs. This hybrid approach allows Faire to leverage the rich Python AI ecosystem (LangChain, LlamaIndex, and similar frameworks) without rewriting their core platform. One application described involves agents composing multi-step buying flows for account managers. Rather than performing a single product search, an agent can understand a retailer's profile (location, store type, past purchases, customer demographics) and assemble a coherent, purchasable assortment of products that fits their needs. The agent then iterates based on feedback or additional constraints. This represents a more sophisticated use of LLMs than simple query understanding—it's orchestrating multiple operations (profile analysis, product selection, compatibility checking, pricing verification) and maintaining state across conversation turns. The agent needs access to various internal services (inventory, pricing, recommendation engines) and must coordinate calls across them while keeping track of the buyer's evolving requirements. From an LLMOps perspective, Project Burrito addresses several critical production concerns: **Language and Framework Flexibility**: By not forcing all AI work into Kotlin, Faire can use the most appropriate tools for agent development while preserving their existing service architecture. This pragmatic approach acknowledges that different languages have different strengths. **Integration with Existing Systems**: The nginx routing layer and well-defined service boundaries allow the Python agents to invoke existing Kotlin services as needed. This avoids duplicating business logic in Python and ensures agents operate on the same data and rules as the rest of the platform. **Gradual Migration Path**: Rather than a big-bang rewrite, the hybrid architecture allows Faire to incrementally add agent capabilities. New agentic flows can be built in Python while existing functionality remains in Kotlin, reducing risk and allowing for experimentation. **Operational Complexity**: Of course, this architecture also introduces challenges. Operating services in multiple languages complicates deployment, monitoring, and debugging. The team needs expertise in both ecosystems, and failure modes can span language boundaries, making troubleshooting more difficult. The presentation didn't detail specific monitoring or observability strategies for the hybrid architecture, but effective LLMOps would require careful instrumentation of the Python agents (tracking LLM calls, latencies, token usage, reasoning traces) and correlation with Kotlin service metrics to understand end-to-end performance and failure patterns. ## Production AI Philosophy and Guardrails During the fireside chat, Co-Founder and Chief Architect Marcelo Cortes and Head of Engineering for Discovery Yvonne Luo discussed Faire's broader philosophy on AI in production. Several themes emerged that reflect mature LLMOps thinking: **AI as an Abstraction Layer**: Cortes positioned AI as similar to previous platform shifts like cloud computing and mobile. It changes how teams think about interfaces and capabilities—systems can now reason and adapt rather than just execute predefined logic. However, it doesn't eliminate the need for thoughtful engineering, robust infrastructure, or careful product design. **Humans-in-the-Loop by Design**: The speakers emphasized building guardrails, observability, and explicit failure modes into AI systems, particularly those affecting customers. While the presentation didn't provide specific examples of these guardrails, the stated philosophy recognizes that LLMs can fail unpredictably and that production systems need mechanisms to detect and mitigate these failures. This principle is especially critical for customer-facing agents. An agent recommending products to a buyer needs constraints on what it can suggest (only in-stock items, only from verified brands, respecting retailer preferences), validation of its outputs (are the recommended products actually coherent together?), and fallback behavior when it can't complete a task (gracefully handing off to a human account manager). **Real Customer Problems as the North Star**: Faire explicitly frames AI as "one tool in the toolkit" rather than an end in itself. The focus remains on solving specific pain points: helping retailers discover differentiated products, providing personalized recommendations, and simplifying wholesale workflows. This problem-first rather than technology-first approach helps ensure AI investments generate actual business value. ## Evaluation and Quality Assurance While the presentation didn't extensively detail testing and evaluation practices for their AI systems, several implicit quality assurance approaches can be inferred from the described implementations: For the agentic development tools, the most direct evaluation metric is engineer adoption and satisfaction. If agents consistently produce code that needs extensive manual correction, engineers will stop using them. The presentation's mention of "steady adoption and meaningful offload" suggests the agents are meeting quality thresholds, though specific metrics (pull request acceptance rates, code quality scores, time savings) weren't provided. For natural-language search, quality can be measured through traditional search metrics: click-through rates, successful session rates, conversion from search to order, and time-to-find. Additionally, the system needs to handle cases where natural language parsing fails or produces ambiguous results. Presumably there's a fallback to traditional keyword search when intent understanding confidence is low, though this wasn't explicitly stated. Image search evaluation likely involves both automated metrics (retrieval precision/recall using labeled test sets) and human review. The challenge is defining "correct" results for visual similarity—different retailers might interpret the same image differently depending on what aspect they're focused on (shape, color, style, use case). For the multi-step agent flows described in Project Burrito, evaluation becomes more complex. Success might be measured by account manager satisfaction, conversion rates on agent-composed assortments, or retailer feedback. However, these are lagging indicators. More immediate evaluation likely involves agent reasoning trace review, validation that selected products meet specified constraints, and perhaps human review of recommendations before they reach customers. ## Gaps and Balanced Assessment While the presentation provides an engaging overview of Faire's AI initiatives, certain critical LLMOps details remain unclear: **Monitoring and Observability**: How does Faire track LLM performance in production? What metrics do they monitor for drift or degradation? How quickly can they detect when an agent or search feature is misbehaving? These operational concerns are fundamental to production AI but weren't addressed in the presentation. **Cost Management**: LLM API calls can become expensive at scale, particularly for agentic workflows that involve multiple inference requests per user action. Faire didn't discuss cost optimization strategies, token usage monitoring, or trade-offs between model capability and operational expense. **Model Selection and Versioning**: The presentation didn't specify which LLM providers or models Faire uses for different features. Are they using OpenAI, Anthropic, open-source models, or a mix? How do they handle model updates and versioning? What's their strategy if a provider deprecates an API or significantly changes behavior? **Failure Modes and Degradation Handling**: While the fireside chat mentioned building guardrails and explicit failure modes, concrete examples weren't provided. How does natural-language search behave when it can't parse a query? What happens if an agent in Project Burrito gets stuck in a loop or produces nonsensical recommendations? These edge cases are where production AI systems often struggle. **Security and Data Privacy**: Especially for the developer productivity tools that interact with Faire's codebase, what controls are in place to prevent agents from exposing sensitive code or data? How do they ensure prompts and agent interactions don't leak proprietary information to external LLM providers? **Evaluation Infrastructure**: Beyond adoption metrics and user satisfaction, what systematic evaluation does Faire perform? Do they maintain benchmark datasets? Do they run regular regression tests on agent capabilities? How do they validate improvements before deploying changes? It's worth noting that this was a public tech talk aimed at recruiting and community engagement, not a deep technical whitepaper. The absence of these details doesn't necessarily indicate gaps in Faire's practices—they may simply be keeping operational details internal. However, from an LLMOps assessment perspective, these areas represent critical components of production AI systems that aren't visible in the presentation. ## Conclusions and Takeaways Faire's approach to AI in production demonstrates several principles worth highlighting: **Pragmatic Tool Selection**: By programmatically extending GitHub Copilot rather than building everything from scratch, and by using hybrid architectures that preserve existing investments, Faire shows a willingness to use the best tool for each job rather than forcing everything into a single framework or language. **Dual Focus on Internal and External Applications**: Investing in developer productivity alongside customer-facing features creates a flywheel effect—more productive engineers can ship customer features faster, and learnings from internal tools inform customer-facing implementations. **Incremental Rollout**: The hybrid architecture of Project Burrito and the gradual introduction of agent capabilities reflect an understanding that production AI benefits from experimentation and iteration rather than big-bang deployments. **Maintaining Human Oversight**: The consistent emphasis on humans-in-the-loop, AI as augmentation rather than replacement, and engineers remaining accountable for code quality demonstrates appropriate caution about LLM limitations. That said, the presentation's promotional nature means claims about effectiveness, adoption, and impact should be viewed with appropriate skepticism until validated by detailed metrics or independent assessment. The examples shown are compelling, but real-world production AI often encounters edge cases, scaling challenges, and maintenance burdens that polished tech talks don't fully capture. Overall, Faire appears to be taking a thoughtful, multi-faceted approach to LLMOps, investing in both the infrastructure to support AI agents and the specific features that address customer needs. Their willingness to share architectural details and specific use cases contributes valuable knowledge to the broader LLMOps community, even as questions about operational details, evaluation practices, and long-term sustainability remain to be fully answered.

Start deploying reproducible AI workflows today