Cisco developed an agentic AI platform leveraging LangChain to transform their customer experience operations across a 20,000-person organization managing $26 billion in recurring revenue. The solution combines multiple specialized agents with a supervisor architecture to handle complex workflows across customer adoption, renewals, and support processes. By integrating traditional machine learning models for predictions with LLMs for language processing, they achieved 95% accuracy in risk recommendations and reduced operational time by 20% in just three weeks of limited availability deployment, while automating 60% of their 1.6-1.8 million annual support cases.
Cisco’s Customer Experience (CX) division presented a comprehensive case study on deploying agentic AI at enterprise scale. This division is responsible for managing customer lifecycle operations for a $56 billion company, with over $26 billion in recurring revenue and approximately 20,000 employees. The presentation, delivered at a LangChain conference, details how Cisco has moved beyond experimentation to production-grade multi-agent systems that power renewals, adoption, support, and delivery workflows.
The speaker emphasizes that Cisco’s AI journey didn’t start with LLMs—they have been working with machine learning and predictive AI models for over a decade. This foundation in traditional data science proved critical because, as they note, LLMs are excellent for language-based interactions but “really bad for predictions.” The current agentic AI approach combines both paradigms: LLMs for natural language understanding and generation, and traditional ML models for deterministic predictions and risk scoring.
Before diving into technical implementation, Cisco established a rigorous framework for use case selection. They recount an experience with a customer who initially identified 412 potential AI use cases, which after proper evaluation reduced to just 5 that actually contributed business value—less than 10% of the original list. This experience shaped their approach to require all use cases to fit one of three strategic buckets:
This disciplined approach prevents teams from pursuing “cool” AI projects that don’t tie back to business outcomes. The speaker is emphatic that defining use cases and metrics first is essential—before considering RAG, prompts, few-shot learning, supervisor patterns, or fine-tuning.
Cisco’s architecture supports flexible deployment models to accommodate diverse customer requirements including federal customers, healthcare organizations, and European customers with strict data residency regulations. Their stack includes:
The key architectural achievement is that the same agentic framework can run identically on-premises in physical data centers and 100% in the cloud without any code changes. This speaks to a well-abstracted architecture where the LLM provider is configurable rather than hard-coded.
The system employs a supervisor pattern for orchestrating multiple specialized agents. Before supervisor architectures were widely discussed in the industry, Cisco developed their own approach where a supervisor agent receives natural language queries, decomposes them, and routes them to appropriate specialized agents.
For example, a typical renewals question like “What’s the upcoming renewed status for customer XYZ and the actions required to minimize its potential risk?” requires:
A single monolithic agent couldn’t achieve the required 95%+ accuracy target. Instead, the supervisor decomposes the query and orchestrates calls across:
These agents can operate in parallel where appropriate, with context being carried back and forth through the LangChain framework. The system uses LangSmith for tracing all interactions between supervisors and agents.
A crucial architectural decision is the integration of traditional machine learning models for predictions. As the speaker notes, “LLMs are very probabilistic” while ML models are “very deterministic”—combining both achieves the accuracy levels required for enterprise use.
When processing a risk-related query, the system routes to a predictive machine learning model rather than asking an LLM to make predictions. This hybrid approach leverages each technology’s strengths: LLMs for language understanding, reasoning, and generation; ML models for numerical predictions, risk scoring, and pattern recognition based on historical data.
The speaker candidly discusses the challenges of integrating LLMs with SQL databases, stating that “the three-letter acronym SQL and another three-letter acronym LLM don’t go on a date.” They emphasize avoiding using LLMs for SQL joins, which he describes as working “royally” poorly.
Their solution involves using Snowflake’s Cortex for semantic context on metadata, but critically, they normalize data first before any LLM interaction. This approach separates the concerns: structured data operations happen through traditional SQL and data pipelines, while LLMs handle natural language interfaces and reasoning.
Cisco runs three parallel workstreams with separate teams:
The speaker emphasizes the importance of this separation. Production teams have different metrics than experimentation teams—mixing them causes problems. The experimentation team needs freedom to fail fast, while the production team needs stability.
A key organizational insight is maintaining separation between development and evaluation teams. As the speaker colorfully puts it, “you don’t make the dog the custodian of the sausages.” The evaluation team maintains golden datasets and independently assesses whether systems meet performance, cost, and accuracy requirements. This prevents the natural tendency for development teams to rationalize their own results.
The limited availability phase involves direct interaction with end users before building features. The speaker notes that renewals people ask renewals-related questions while adoption people ask adoption-related questions—this seems obvious but many teams build AI features first and only then try to understand user needs. Cisco reverses this: they go to users first, understand their questions, and then build AI to help them.
The system operates at significant scale:
For the renewals use case specifically, they consolidated over 50 different data sets and numerous tools that renewal agents previously had to navigate manually.
The speaker touches on forward-looking work around agent interoperability, mentioning that MCP (Model Context Protocol) is valuable but “needs to evolve” and is currently like “Swiss cheese” with gaps. Cisco and LangChain are co-championing an open-source initiative called “Agency” that proposes a full architecture for agent communication.
The analogy used is DNS servers—when you access the internet, DNS resolution is fundamental infrastructure. Currently, there’s no equivalent “directory service” for agents. The Agency architecture addresses:
The speaker offers several practical recommendations:
The presentation represents a mature, production-grade implementation with clear evidence of real-world deployment at scale, moving well beyond proof-of-concept or pilot stages.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Digits, a company providing automated accounting services for startups and small businesses, implemented production-scale LLM agents to handle complex workflows including vendor hydration, client onboarding, and natural language queries about financial books. The company evolved from a simple 200-line agent implementation to a sophisticated production system incorporating LLM proxies, memory services, guardrails, observability tooling (Phoenix from Arize), and API-based tool integration using Kotlin and Golang backends. Their agents achieve a 96% acceptance rate on classification tasks with only 3% requiring human review, handling approximately 90% of requests asynchronously and 10% synchronously through a chat interface.
OpenAI's Forward Deployed Engineering (FDE) team, led by Colin Jarvis, embeds with enterprise customers to solve high-value problems using LLMs and deliver production-grade AI applications. The team focuses on problems worth tens of millions to billions in value, working with companies across industries including finance (Morgan Stanley), manufacturing (semiconductors, automotive), telecommunications (T-Mobile, Klarna), and others. By deeply understanding customer domains, building evaluation frameworks, implementing guardrails, and iterating with users over months, the FDE team achieves 20-50% efficiency improvements and high adoption rates (98% at Morgan Stanley). The approach emphasizes solving hard, novel problems from zero-to-one, extracting learnings into reusable products and frameworks (like Swarm and Agent Kit), then scaling solutions across the market while maintaining strategic focus on product development over services revenue.