ZenML

Building Goal-Oriented Retrieval Agents for Low-Latency Recommendations at Scale

Faber Labs 2024
View original source

Faber Labs developed Gora (Goal-Oriented Retrieval Agents), a system that transforms subjective relevance ranking using cutting-edge technologies. The system optimizes for specific KPIs like conversion rates and average order value in e-commerce, or minimizing surgical engagements in healthcare. They achieved this through a combination of real-time user feedback processing, unified goal optimization, and high-performance infrastructure built with Rust, resulting in consistent 200%+ improvements in key metrics while maintaining sub-second latency.

Industry

E-commerce

Technologies

Overview

Faber Labs, a startup founded by Zoe Zoe and her co-founders, has developed GORA (Goal Oriented Retrieval Agents), which they describe as the first specialized agents designed to autonomously maximize specific business KPIs through subjective relevance ranking. The company positions itself as providing an “embedded KPI optimization layer” for consumer-facing businesses, essentially attempting to replicate the recommendation engine capabilities that power companies like Amazon (reportedly contributing 35% of Amazon’s revenue) and make them available to a broader range of businesses.

The presentation was delivered at what appears to be a technical AI/ML conference, with the speaker walking through both the architectural decisions and business outcomes of their system. While the claims about 200%+ improvements are significant and should be viewed with appropriate skepticism given the promotional nature of the talk, the technical architecture decisions discussed provide valuable insights into building production agent systems.

Use Cases and Applications

GORA has been applied across multiple industries, each with different optimization targets:

The key insight from their approach is that different clients have fundamentally different and sometimes conflicting optimization goals. For example, maximizing conversion rate doesn’t necessarily maximize gross merchandise value because users might convert more frequently but purchase cheaper items. Their system aims to jointly optimize these potentially conflicting metrics.

Technical Architecture and LLMOps Considerations

Three Core Pillars

The system is built around three foundational concepts:

Large Event Models

One of the more novel technical contributions discussed is their development of “Large Event Models” (LEMs). These are custom models trained from scratch to generalize to user event data, analogous to how LLMs generalize to unseen text. The key innovation here is that these models can understand event sequences they haven’t explicitly seen before, enabling transfer learning across different client contexts.

The company trains these models using client data, and they’ve specifically designed their data pipeline to handle “messy” client data without extensive preprocessing requirements. This is a practical consideration for any production ML system—real-world data is rarely clean, and building systems that can tolerate data quality issues is essential for scalability.

Importantly, they claim to leverage network effects across clients for the reward/alignment layer without leaking private information between clients. This suggests some form of federated learning or privacy-preserving technique, though the specifics weren’t detailed.

LLM Integration

While LEMs handle the core ranking logic, the company does use open-source LLMs for specific components—particularly for “gluing everything together” and presenting results to users. This hybrid approach is notable: rather than relying on LLMs for the computationally intensive ranking operations, they use specialized models for the core task and leverage LLMs where their language capabilities are most valuable.

End-to-End Reinforcement Learning

The architecture employs end-to-end reinforcement learning with policy models to jointly optimize multiple model components:

This holistic optimization approach is designed to avoid the common pitfall of “stacked models” where individual components are optimized for different objectives that may conflict with each other. The unified goal system ensures all components work toward the same business outcome.

Backend Infrastructure: The Rust Decision

One of the most emphasized architectural decisions was migrating the backend to Rust. The speaker acknowledged this was controversial, especially for a team with Python and data science backgrounds, but described it as one of their best decisions. The benefits they cite:

The speaker specifically mentioned that Discord’s migration to Rust was an inspiration for this decision. They use Rust for:

The transition was described as “super painful,” with many Rust concepts being counterintuitive for developers coming from Python. However, the investment paid off in enabling their on-premise deployment option, which would have been much more difficult with a heavier technology stack.

Latency Management

Latency is a critical concern for the system, with the speaker citing research that 53% of mobile users abandon sites taking longer than 3 seconds to load. Their approach includes:

The speaker emphasized that their latency numbers should be evaluated in context—this is a conversation-aware, context-aware system with agent capabilities, not a simple single-pass query-embedding retrieval system. Compared to other conversational and adaptive agent-based systems, their response times are competitive.

Conversation and Feedback Loop Management

A key technical challenge for any production agent system is managing multi-turn conversations efficiently. As conversations grow, the context becomes increasingly large and unwieldy. Their solutions include:

Privacy and Deployment Considerations

Privacy emerged as a significant concern, particularly for healthcare and financial services clients. Their approach includes:

The speaker noted that having an on-premise solution is an additional engineering challenge that can be a “killer” for early-stage businesses, but their Rust infrastructure made it manageable.

Reported Results

The claimed results are substantial, though should be evaluated with appropriate caution given the promotional context:

The speaker emphasized that these gains come from joint optimization of metrics that are often at odds with each other, not just optimizing one metric at the expense of others.

Lessons for Practitioners

Several practical takeaways emerge from this case study:

The speaker’s honest acknowledgment of challenges—the painful Rust transition, the difficulty of on-premise solutions, the need to handle messy client data—adds credibility to the technical discussion and provides realistic expectations for teams considering similar approaches.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building an Enterprise-Grade AI Agent for Recruiting at Scale

LinkedIn 2025

LinkedIn developed Hiring Assistant, an AI agent designed to transform the recruiting workflow by automating repetitive tasks like candidate sourcing, evaluation, and engagement across 1.2+ billion profiles. The system addresses the challenge of recruiters spending excessive time on pattern-recognition tasks rather than high-value decision-making and relationship building. Using a plan-and-execute agent architecture with specialized sub-agents for intake, sourcing, evaluation, outreach, screening, and learning, Hiring Assistant combines real-time conversational interfaces with large-scale asynchronous execution. The solution leverages LinkedIn's Economic Graph for talent insights, custom fine-tuned LLMs for candidate evaluation, and cognitive memory systems that learn from recruiter behavior over time. The result is a globally available agentic product that enables recruiters to work with greater speed, scale, and intelligence while maintaining human-in-the-loop control for critical decisions.

healthcare customer_support question_answering +51

Building Economic Infrastructure for AI with Foundation Models and Agentic Commerce

Stripe 2025

Stripe, processing approximately 1.3% of global GDP, has evolved from traditional ML-based fraud detection to deploying transformer-based foundation models for payments that process every transaction in under 100ms. The company built a domain-specific foundation model treating charges as tokens and behavior sequences as context windows, ingesting tens of billions of transactions to power fraud detection, improving card-testing detection from 59% to 97% accuracy for large merchants. Stripe also launched the Agentic Commerce Protocol (ACP) jointly with OpenAI to standardize how agents discover and purchase from merchant catalogs, complemented by internal AI adoption reaching 8,500 employees daily using LLM tools, with 65-70% of engineers using AI coding assistants and achieving significant productivity gains like reducing payment method integrations from 2 months to 2 weeks.

fraud_detection chatbot code_generation +57