11X developed Alice, an AI Sales Development Representative (SDR) that automates lead generation and email outreach at scale. The key innovation was replacing a manual product library system with an intelligent knowledge base that uses advanced RAG (Retrieval Augmented Generation) techniques to automatically ingest and understand seller information from various sources including documents, websites, and videos. This system processes multiple resource types through specialized parsing vendors, chunks content strategically, stores embeddings in Pinecone vector database, and uses deep research agents for context retrieval. The result is an AI agent that sends 50,000 personalized emails daily compared to 20-50 for human SDRs, while serving 300+ business organizations with contextually relevant outreach.
11X is a company building digital workers for go-to-market organizations, with their flagship product Alice serving as an AI Sales Development Representative (SDR). Alice represents a sophisticated production deployment of LLMs in the sales automation space, processing and generating tens of thousands of personalized sales emails daily. The company also develops Julian, a voice agent, demonstrating their broader commitment to AI-powered business process automation.
The core challenge that 11X faced was creating an AI system that could effectively represent sellers’ businesses while engaging prospects at scale. Traditional SDRs handle lead sourcing, multi-channel engagement, and meeting booking, with success measured by positive reply rates and meetings booked. Alice needed to replicate these human capabilities while operating at significantly higher volume - sending approximately 50,000 emails per day compared to the 20-50 emails a human SDR might send.
The technical foundation of Alice centers around a sophisticated knowledge base system that exemplifies modern LLMOps practices. The architecture follows a clear separation of concerns with multiple specialized components working together in production.
The system architecture begins with users uploading various resource types through a client interface. These resources are stored in S3 buckets and processed through a backend system that creates database records and initiates processing jobs based on resource type and selected vendors. The parsing vendors work asynchronously, sending webhooks upon completion that trigger ingestion processes. The parsed content is then stored in the local database while simultaneously being embedded and upserted to Pinecone vector database, enabling the AI agent to query this information during email generation.
One of the most sophisticated aspects of Alice’s knowledge base is its multi-modal parsing capability. The system handles five distinct resource categories: documents, images, websites, audio, and video content. Rather than building parsing capabilities in-house, 11X made strategic vendor selection decisions based on production requirements.
For documents and images, they selected LlamaIndex’s Llama Parse, primarily due to its extensive file type support and excellent customer support responsiveness. The vendor selection process prioritized practical deployment concerns over theoretical accuracy metrics, focusing on markdown output support, webhook capabilities, and comprehensive file format coverage.
Website parsing utilizes Firecrawl, chosen over competitors like Tavily due to familiarity from previous projects and the fact that Tavily’s crawl endpoint was still in development during their evaluation period. This decision highlights the importance of vendor maturity and feature availability in production LLMOps deployments.
For audio and video processing, 11X selected CloudGlue, a newer vendor that provided both audio transcription and visual content extraction capabilities. This choice demonstrates their willingness to work with emerging vendors when they offer superior technical capabilities, specifically the ability to extract information from video content beyond simple audio transcription.
The chunking strategy employed by Alice’s knowledge base demonstrates sophisticated thinking about semantic preservation and retrieval optimization. The system processes parsed markdown content through a waterfall chunking approach that balances semantic coherence with embedding effectiveness.
The chunking pipeline begins with markdown header-based splitting to preserve document structure and hierarchy. This is followed by sentence-level splitting when chunks exceed target token counts, and finally token-based splitting as a fallback. This multi-layered approach ensures that semantic units remain intact while maintaining appropriate chunk sizes for embedding and retrieval operations.
The preservation of markdown structure is particularly important because formatting contains semantic meaning - distinguishing between titles, paragraphs, and other content types that influence how information should be weighted and retrieved. This attention to structural semantics reflects mature LLMOps thinking about information representation and retrieval quality.
The storage layer utilizes Pinecone as the primary vector database, chosen for practical production considerations rather than purely technical specifications. The selection criteria included cloud hosting to reduce infrastructure management overhead, comprehensive SDK support, bundled embedding models to simplify the technology stack, and strong customer support for troubleshooting production issues.
The embedding strategy leverages Pinecone’s built-in embedding models, reducing the complexity of managing separate embedding model infrastructure. This decision reflects a pragmatic approach to LLMOps where reducing operational complexity can outweigh potential performance optimizations from custom embedding solutions.
Alice’s retrieval system has evolved beyond traditional RAG implementations to incorporate deep research agents, representing the cutting edge of production RAG systems. The system progression moved from traditional RAG to agentic RAG, and finally to deep research RAG using Leta, a cloud agent provider.
The deep research agent receives lead information and dynamically creates retrieval plans that may involve multiple context gathering steps. This approach allows for both broad and deep information gathering depending on the specific context requirements. The agent can evaluate whether additional retrieval steps are needed, enabling more sophisticated and contextually appropriate email generation.
The integration with Leta demonstrates the emerging pattern of leveraging specialized agent platforms rather than building agent orchestration capabilities from scratch. This vendor-assisted approach allows the core engineering team to focus on domain-specific logic while relying on specialized platforms for agent coordination and execution.
The knowledge base includes sophisticated visualization capabilities that serve both operational and customer confidence purposes. The system projects high-dimensional vectors from Pinecone into three-dimensional space, creating an interactive 3D visualization of the knowledge base contents. Users can click on nodes to view associated content chunks, providing transparency into what Alice “knows” about their business.
This visualization serves multiple purposes in the production environment: it enables sales and support teams to demonstrate Alice’s knowledge capabilities, provides debugging capabilities for understanding retrieval behavior, and builds customer confidence by making the AI’s knowledge base tangible and inspectable.
The knowledge base integrates seamlessly into the campaign creation workflow, automatically surfacing relevant information as Q&A pairs with associated source chunks. This integration demonstrates sophisticated UX thinking about how AI capabilities should be presented to end users - making the system’s intelligence visible and actionable rather than hidden behind automated processes.
The transition from the manual library system to the automated knowledge base represents a fundamental shift in user experience philosophy. Instead of requiring users to manually catalog their products and services, the system now automatically ingests and organizes information from natural business documents and resources.
The development of Alice’s knowledge base yielded several important lessons for production LLMOps implementations. The team emphasized that RAG systems are significantly more complex than initially anticipated, involving numerous micro-decisions about parsing, chunking, storage, and retrieval strategies. This complexity multiplies when supporting multiple resource types and file formats.
The recommended approach of reaching production before extensive benchmarking reflects practical LLMOps wisdom. Rather than getting paralyzed by vendor evaluations and performance comparisons, the team prioritized getting a working system into production to establish real usage patterns and performance baselines. This approach enables data-driven optimization based on actual production behavior rather than theoretical benchmarks.
The heavy reliance on specialized vendors rather than building capabilities in-house represents a mature approach to LLMOps infrastructure. By leveraging vendors for parsing, vector storage, and agent orchestration, the team could focus on core domain logic while benefiting from specialized expertise in each component area.
The roadmap for Alice’s knowledge base includes several advanced LLMOps initiatives. Hallucination tracking and mitigation represents a critical production concern for systems generating customer-facing content. The plan to evaluate parsing vendors on accuracy and completeness metrics shows a maturation toward more sophisticated vendor management and quality assurance.
The exploration of hybrid RAG approaches, potentially incorporating graph databases alongside vector storage, demonstrates continued innovation in retrieval architecture. This hybrid approach could enable more sophisticated relationship modeling and contextual understanding beyond simple semantic similarity.
Cost optimization across the entire pipeline reflects the practical concerns of scaling LLMOps systems in production. As usage grows, the accumulated costs of parsing vendors, vector storage, and API calls become significant operational considerations requiring systematic optimization efforts.
The Alice knowledge base case study represents a sophisticated production implementation of modern LLMOps practices, demonstrating how multiple specialized technologies can be orchestrated to create intelligent business automation systems. The emphasis on vendor partnerships, pragmatic technology choices, and user experience integration provides valuable insights for other organizations building production LLM systems.
DoorDash faced challenges in scaling personalization and maintaining product catalogs as they expanded beyond restaurants into new verticals like grocery, retail, and convenience stores, dealing with millions of SKUs and cold-start scenarios for new customers and products. They implemented a layered approach combining traditional machine learning with fine-tuned LLMs, RAG systems, and LLM agents to automate product knowledge graph construction, enable contextual personalization, and provide recommendations even without historical user interaction data. The solution resulted in faster, more cost-effective catalog processing, improved personalization for cold-start scenarios, and the foundation for future agentic shopping experiences that can adapt to real-time contexts like emergency situations.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
This case study examines Cursor's implementation of reinforcement learning (RL) for training coding models and agents in production environments. The team discusses the unique challenges of applying RL to code generation compared to other domains like mathematics, including handling larger action spaces, multi-step tool calling processes, and developing reward signals that capture real-world usage patterns. They explore various technical approaches including test-based rewards, process reward models, and infrastructure optimizations for handling long context windows and high-throughput inference during RL training, while working toward more human-centric evaluation metrics beyond traditional test coverage.