## Company Overview and Use Case
11x is a company building digital workers, with their flagship product Alice serving as an AI Sales Development Representative (SDR) and a secondary product Julian functioning as an AI voice agent. The company recently completed funding rounds and relocated from London to San Francisco while simultaneously undertaking a complete rebuild of their core product. This case study presents a fascinating example of LLMOps in action, showcasing how a company evolved from a basic AI-powered tool to a sophisticated multi-agent system capable of autonomous sales operations.
The original Alice (referred to as Alice 1) was a relatively simple AI-powered outreach campaign tool that required significant manual input and configuration. Users would define their audience, describe their offer, construct email sequences, tweak AI-generated messaging, and launch campaigns. While successful by various metrics, the team recognized that Alice 1 fell short of being a true "digital worker" due to excessive manual intervention requirements, basic lead research capabilities, uninspiring email personalization, inability to handle replies automatically, and lack of self-learning capabilities.
## Technical Evolution and Architecture Decisions
The decision to rebuild Alice from scratch was driven by significant advances in the AI landscape between 2023 and 2024. Key technological milestones included the release of GPT-4, the first cloud models, initial agent frameworks, Claude 2, function calling in the OpenAI API, LangGraph as a production-ready agent framework, Claude 3, GPT-4o, and the Replit agent which served as inspiration for what agentic software products could achieve.
The rebuild represented an aggressive engineering effort - just two engineers initially, expanding to include a project manager, with the entire migration from Alice 1 to Alice 2 completed in three months while serving approximately 300 customers with growing demand. The team made several strategic decisions: starting completely from scratch with new repositories and infrastructure, using familiar technologies to minimize risk while adopting unfamiliar agent technologies, and leveraging vendor solutions extensively to accelerate development.
## Technology Stack and Vendor Partnerships
11x chose a deliberately vanilla technology stack to minimize risk while experimenting with cutting-edge agent technologies. Their partnership with LangChain proved crucial, providing not just the technical infrastructure but also extensive support and guidance. The team utilized the entire LangChain suite, including LangGraph for agent orchestration, cloud hosting, and observability tools. This partnership exemplifies how LLMOps often requires strong vendor relationships and external expertise, particularly when adopting emerging technologies.
The company's decision to leverage multiple vendors rather than building everything in-house reflects a practical approach to LLMOps where speed to market and reliability often outweigh the desire for complete control over every component. This strategy allowed them to focus engineering resources on their core differentiator - the agent architecture and business logic - while relying on proven infrastructure for supporting services.
## Agent Architecture Evolution
The most technically interesting aspect of this case study is the systematic exploration of three different agent architectures, each revealing important insights about LLMOps in practice.
### React Architecture Implementation
The team's first approach used the React (Reason and Act) model, originally developed by Google researchers in 2022. This architecture implements a simple execution loop where the agent reasons about what to do, takes action through tool calls, and observes the results. Their implementation consisted of a single assistant node with 10-20 tools covering various campaign creation functions like fetching leads, inserting database entities, and drafting emails.
The React architecture offered significant advantages in its simplicity and flexibility. The single-node design never required structural revisions, and the system handled arbitrary user inputs across multiple conversation turns effectively since the agent ran to completion for each turn. This robustness to non-linear user interactions is a crucial consideration for production LLMOps systems.
However, the React approach revealed important limitations when scaling tool usage. With many tools attached, the agent struggled with tool selection and sequencing, leading to infinite loops and recursion limit errors - essentially the agent equivalent of stack overflows. Additionally, the outputs were mediocre because a single agent and prompt set couldn't specialize effectively across the entire campaign creation process.
### Workflow-Based Architecture
To address the React architecture's limitations, the team implemented a workflow approach based on Anthropic's definition of systems where LLMs and tools are orchestrated through predefined code paths. This resulted in a much more complex graph with 15 nodes across five stages corresponding to campaign creation steps.
Unlike the React agent, this workflow didn't run to completion for every turn but rather executed once for the entire campaign creation process, using LangGraph's node interrupts feature to collect user feedback at specific points. This approach eliminated tool selection issues by replacing tools with specialized nodes and provided a clearly defined execution flow that prevented infinite loops.
The workflow architecture produced significantly better outputs because it forced the agent through specific, optimized steps. However, it introduced new problems: extreme complexity, tight coupling between the front-end user experience and agent architecture, and inability to support non-linear user interactions within the campaign creation flow.
### Multi-Agent System Implementation
The final architecture drew inspiration from a LangChain blog post about customer support agents using hierarchical multi-agent systems. This approach features a supervisor agent responsible for user interaction and task routing, with specialized sub-agents handling specific functions.
11x's implementation includes a supervisor node and four specialist sub-agents: a researcher, a positioning report generator, a LinkedIn message writer, and an email writer. This architecture achieved the flexibility of the React approach while maintaining the performance benefits of the workflow system.
## Production Performance and Results
Alice 2 launched in January and has demonstrated impressive production metrics that validate the multi-agent approach. The system has sourced nearly two million leads, sent approximately three million messages, and generated about 21,000 replies with a 2% reply rate that matches human SDR performance. These metrics provide concrete evidence of successful LLMOps implementation in a demanding real-world environment.
The reply rate is particularly significant because it represents a key business metric that directly impacts customer value. Achieving parity with human performance while automating the entire process demonstrates the potential of well-architected agent systems in production environments.
## LLMOps Lessons and Insights
This case study reveals several important principles for LLMOps practitioners. First, simplicity emerges as a critical factor for long-term success. While complex structures can provide short-term performance gains, they often create technical debt that becomes counterproductive over time. The team's experience with the workflow architecture illustrates how over-engineering can lead to inflexibility and maintenance challenges.
Second, the impact of model releases on agent performance cannot be understated. The Replit team's experience, mentioned in the presentation, where their agent only became effective after Sonnet 3.5's release, highlights how LLMOps success often depends on underlying model capabilities. This creates both opportunities and risks for production systems built on rapidly evolving foundation models.
Third, the mental model for agent design significantly impacts architectural decisions. 11x initially thought of their agent as a user flow or directed graph, leading to suboptimal implementations. Reconceptualizing the agent as a human coworker or team of coworkers led to the successful multi-agent architecture.
Fourth, task decomposition proves crucial for effective agent implementation. Breaking the large task of campaign creation into smaller, specialized tasks like email writing and lead research enabled better performance through specialization while maintaining system coherence.
## Tool Design and Implementation Philosophy
The case study emphasizes the principle that tools are preferable to skills when designing agent systems. Rather than trying to make agents inherently smarter, providing appropriate tools and clear usage instructions often yields better results while using fewer tokens. This approach also makes systems more maintainable and debuggable.
The team's extensive use of tools in their React implementation - covering functions like database operations, email drafting, and lead research - demonstrates how production agent systems often require numerous specialized capabilities. However, their experience also shows that tool proliferation can create its own challenges, requiring careful architecture decisions to manage complexity.
## Observability and Production Monitoring
While not extensively detailed in the presentation, the team's partnership with LangChain included observability tools that proved essential for understanding agent performance in production. This highlights a often-overlooked aspect of LLMOps: the need for comprehensive monitoring and debugging capabilities when deploying autonomous agents in business-critical applications.
The ability to observe agent behavior, understand decision-making processes, and debug failures becomes particularly important in multi-agent systems where emergent behaviors can arise from interactions between specialized components.
## Future Development and Scaling Considerations
11x's future plans reveal ongoing challenges and opportunities in LLMOps. Integration between Alice and Julian (their voice agent) represents the next evolution toward multi-modal agent systems. Self-learning capabilities, while mentioned as work in progress, point to the ultimate goal of autonomous improvement without human intervention.
The company's exploration of computer use, memory systems, and reinforcement learning reflects the rapidly expanding frontier of agent capabilities. However, their methodical approach to architecture evolution suggests they've learned the importance of solid foundations before adding complexity.
## Vendor Strategy and Ecosystem Dependencies
The case study illustrates how successful LLMOps often requires strategic vendor partnerships rather than building everything internally. 11x's relationship with LangChain went beyond simple tool usage to include extensive support, training, and guidance. This dependency on external expertise and infrastructure represents both an enabler and a potential risk for production systems.
The team's decision to use "pretty much the entire suite" of LangChain products demonstrates how comprehensive platforms can accelerate development but also create vendor lock-in risks. Balancing speed to market with strategic independence remains a key consideration for LLMOps teams.