Fastweb / Vodafone: AI-Powered Customer Service and Call Center Transformation with Multi-Agent Systems

Company

Fastweb / Vodafone

Title

AI-Powered Customer Service and Call Center Transformation with Multi-Agent Systems

Industry

Telecommunications

Link

https://blog.langchain.com/customers-vodafone-italy/

Year

2025

Summary (short)

Fastweb / Vodafone, a major European telecommunications provider serving 9.5 million customers in Italy, transformed their customer service operations by building two AI agent systems to address the limitations of traditional customer support. They developed Super TOBi, a customer-facing agentic chatbot system, and Super Agent, an internal tool that empowers call center consultants with real-time diagnostics and guidance. Built on LangGraph and LangChain with Neo4j knowledge graphs and monitored through LangSmith, the solution achieved a 90% correctness rate, 82% resolution rate, 5.2/7 Customer Effort Score for Super TOBi, and over 86% One-Call Resolution rate for Super Agent, delivering faster response times and higher customer satisfaction while reducing agent workload.

## Overview Fastweb / Vodafone, part of the Swisscom Group and one of Europe's leading telecommunications providers, undertook a comprehensive AI transformation of their customer service operations serving millions of customers across Italy. The case study presents a sophisticated production deployment of multi-agent LLM systems addressing both customer-facing and internal operations. The implementation demonstrates mature LLMOps practices including comprehensive monitoring, automated evaluation pipelines, and continuous improvement processes at enterprise scale. The telecommunications context presents unique challenges that make this a compelling LLMOps case study: customers require immediate assistance across diverse domains (billing, service activations, roaming, technical support) with high expectations for single-interaction resolution. Traditional chatbot approaches struggled with nuanced requests requiring contextual understanding and multi-system access. For call center agents, the manual process of consulting multiple systems and knowledge bases, while functional, created opportunities for inconsistency and slower resolution times. The company needed production AI systems that could handle both autonomous customer interactions and augment human agent capabilities while maintaining high service standards. ## Architecture and Technical Implementation The solution centers around two flagship production systems built on LangChain and LangGraph: Super TOBi (customer-facing) and Super Agent (internal call center support). ### Super TOBi: Customer-Facing Agentic System Super TOBi represents an evolution of Fastweb / Vodafone's existing chatbot (TOBi) into a sophisticated agentic system operating at enterprise scale across multiple channels. The architecture is organized around two primary LangGraph-based agents that demonstrate advanced orchestration patterns. The **Supervisor agent** serves as the central entry point and orchestration layer for all customer queries. Its responsibilities extend beyond simple routing—it applies comprehensive guardrails for input filtering and safety, manages special scenarios including conversation endings and operator handovers, handles simple interactions like greetings, and most notably, implements clarification dialogue when uncertain about routing decisions. This clarification capability represents a practical approach to handling ambiguous customer intent rather than forcing premature classification. The Supervisor routes validated queries to specialized use case agents, creating a clean separation of concerns in the system architecture. The **Use Cases agents** represent specialized handlers for specific customer need categories, each implemented as distinct LangGraph graphs following the LLM Compiler pattern. This pattern enables sophisticated reasoning about API invocation sequences, coordination of multi-step processes, and generation of contextually appropriate responses. Each use case agent has access to a carefully scoped subset of customer APIs, demonstrating thoughtful security and access control design. A particularly noteworthy production feature is the system's ability to emit structured action tags alongside natural language responses. These tags enable transactional flows to execute directly within conversational interfaces—customers can activate offers, disable services, or update payment methods through conversational interactions that seamlessly blend dialogue with backend actions. The ChatBot interface automatically executes these action tags, creating a unified conversational-transactional experience that addresses a common challenge in production customer service systems. The LLM Compiler pattern implementation within LangGraph enables comprehensive planning for customer requests, orchestrating API calls, data retrieval, and multi-step problem resolution. This represents mature thinking about production agent architectures where deterministic orchestration combines with LLM reasoning capabilities. ### Super Agent: Internal Call Center Augmentation Super Agent takes a fundamentally different approach by augmenting human consultants rather than replacing them. The system never speaks directly to customers; instead, it equips call center agents with instant diagnostics, compliance-checked guidance, and source-backed explanations. This human-in-the-loop design reflects pragmatic thinking about production AI deployment in high-stakes customer service contexts. The architecture blends LangChain's composable tools with LangGraph's orchestration capabilities, but the distinguishing feature is the use of Neo4j to store operational knowledge as a living knowledge graph. This architectural choice enables sophisticated procedural reasoning and relationship-aware retrieval that would be difficult with vector-based approaches alone. **Automated Knowledge Graph Construction**: The system includes a sophisticated ETL pipeline that transforms business-authored troubleshooting procedures into graph structures. Business specialists write procedures using structured templates with defined steps, conditions, and actions. Upon submission, an automated LangGraph-powered pipeline with task-specific LLM agents (including ReAct agents) parses documents into JSON, extracts verification APIs, performs consistency checks, and refines step definitions. The content is decomposed into nodes (Steps, Conditions, Actions, APIs) and relationships, then stored in Neo4j. A CI/CD pipeline automates build, validation, and deployment, promoting updated graphs to production within hours without downtime. This represents a production-ready approach to maintaining dynamic knowledge bases with minimal manual intervention. **Intent Routing and Execution Flows**: Incoming consultant requests are processed by a LangGraph Supervisor that determines whether requests match graph-based procedures (structured troubleshooting) or open-ended questions (knowledge base queries). CRM data is automatically injected to ensure user identification and context relevance, demonstrating attention to security and personalization concerns in production systems. For **graph-based procedure execution**, the Supervisor activates a procedural sub-graph executor that retrieves the first Step node and associated Condition, Action, and API nodes from Neo4j. The system executes required APIs to validate conditions, proceeding iteratively through the procedure graph until identifying the problem and solution. This approach enables deterministic, auditable troubleshooting flows that maintain compliance with company policies while leveraging LLM capabilities for reasoning. For **open-ended questions**, the system routes to a hybrid RAG chain combining vector store retrieval with Neo4j knowledge graph traversal. The vector store provides broad recall of potentially relevant passages, while the knowledge graph anchors answers with appropriate context, source citations, and policy compliance. This hybrid approach addresses common limitations of pure vector-based RAG in enterprise contexts where relationships and governance matter. ### Technology Stack and Design Patterns The implementation showcases several advanced LLMOps patterns: **Supervisor Pattern**: Maintains deterministic intent routing while allowing specialized sub-graphs to evolve independently, addressing a common challenge in multi-agent systems where changes can have cascading effects. **Customized LLM Compiler**: The implementation extends the LLM Compiler pattern with telecommunications-specific LangGraph nodes for API orchestration, rule checking, and exception handling, demonstrating how general patterns can be adapted for domain requirements. **Neo4j Knowledge Graph**: Stores procedural steps, validation rules, APIs, documents, and solutions as first-class graph citizens, enabling efficient graph traversals and relationship-aware reasoning that complements vector-based approaches. **Governance by Design**: Every recommendation is validated against Rule nodes encoding company policy, embedding compliance directly in the architecture rather than treating it as an afterthought. **Deployment Agility**: The architectural design enables integration of new capabilities without re-engineering, addressing the common production challenge of evolving requirements. ## LLMOps Practices: Monitoring and Continuous Improvement The case study presents particularly strong LLMOps practices around monitoring, evaluation, and continuous improvement. Fastweb / Vodafone implemented LangSmith from day one, recognizing that production agentic systems require deep observability. As Pietro Capra, Chat Engineering Chapter Lead, notes: "You can't run agentic systems in production without deep observability. LangSmith gave us end-to-end visibility into how our LangGraph workflows reason, route, and act, turning what would otherwise be a black box into an operational system we can continuously improve." The team developed sophisticated **automated evaluation processes** that run daily: - Traces from daily interactions are collected in LangSmith datasets - Automated evaluation using LangSmith Evaluators SDK runs during overnight processing - Analysis encompasses user queries, chatbot responses, context, and grading guidelines - Structured output includes numerical scores (1-5 scale), explanations, and identification of violated guidelines This automated evaluation system enables business stakeholders to review daily performance metrics, provide strategic input, and communicate with technical teams for prompt adjustments. The goal is maintaining the 90% correctness rate target, demonstrating how quantitative targets drive continuous improvement processes. The combination of automated monitoring and human oversight ensures consistent value delivery while identifying improvement areas. LangSmith streams chain traces, latency metrics, and evaluation scores to internal dashboards for continuous optimization. This integration of monitoring tools with business dashboards represents mature thinking about making LLM system performance visible to both technical and business stakeholders. ## Production Results and Business Impact The production deployment demonstrates measurable business impact across both systems: **Super TOBi** serves nearly 9.5 million customers through the Customer Companion App and voice channels, handling use cases including cost control, active offers, roaming, sales, and billing. The system achieves: - 90% correctness rate - 82% resolution rate - 5.2 out of 7 Customer Effort Score (CES) - Faster response times - Fewer human-operator transfers - Higher customer satisfaction **Super Agent** drives One-Call Resolution (OCR) rates above 86%, representing significant improvement in call center efficiency. The human-in-the-loop design ensures quality while leveraging AI for augmentation rather than replacement. These metrics represent meaningful business outcomes rather than just technical performance indicators. The Customer Effort Score and resolution rates directly impact customer satisfaction and operational costs. ## Critical Assessment and Considerations While the case study presents impressive results, several considerations warrant balanced assessment: **Claimed vs. Validated Performance**: The 90% correctness rate and 82% resolution rate are company-reported metrics. The case study doesn't detail how these are measured (e.g., whether correctness is human-evaluated, automated, or based on customer feedback) or what baseline they're compared against. The automated evaluation system provides structure, but the specific evaluation criteria and potential biases aren't fully transparent. **Complexity Trade-offs**: The architecture involves substantial complexity—multiple LangGraph agents, Neo4j knowledge graphs, vector stores, automated ETL pipelines, and custom orchestration patterns. While this enables sophisticated capabilities, it also creates operational overhead, maintenance burden, and potential failure points. The case study emphasizes deployment agility, but the initial setup and ongoing maintenance costs aren't discussed. **Scope of Automation**: Super TOBi handles specific use cases (cost control, offers, roaming, sales, billing) but doesn't clarify what percentage of total customer interactions these represent or what happens with out-of-scope queries. The 82% resolution rate suggests 18% of interactions still require escalation, and understanding these edge cases would provide valuable insight. **Human-in-the-Loop Design Philosophy**: The choice to make Super Agent fully consultant-facing rather than customer-facing represents a conservative but pragmatic approach. This avoids risks of AI errors directly impacting customers but may limit potential efficiency gains. The case study doesn't explore whether this was driven by regulatory concerns, risk tolerance, or empirical findings. **Knowledge Graph Maintenance**: While the automated ETL pipeline for converting business procedures to graph structures is impressive, the case study doesn't address how the system handles contradictions, outdated procedures, or version control across the knowledge base. The "living graph" concept requires ongoing curation that could represent significant operational overhead. **Evaluation System Limitations**: The automated evaluation runs overnight on previous day's interactions, meaning issues are detected retroactively rather than in real-time. The reliance on LLM-based evaluation (using evaluators SDK) introduces potential for evaluation drift and the classic "LLM judging LLM" circularity concern. **Vendor Positioning**: This case study appears on LangChain's blog, raising questions about selection bias and potential overemphasis on LangChain-specific tools. While LangGraph and LangSmith may be appropriate choices, alternative architectures aren't discussed. ## Advanced LLMOps Insights Several aspects of this implementation offer valuable lessons for production LLM systems: **Clarification Dialogues**: The Supervisor's ability to ask clarification questions when uncertain about routing represents a mature approach to handling ambiguity. Many production systems force premature classification, degrading user experience. Building clarification into the core architecture acknowledges the limitations of single-turn classification. **Hybrid Knowledge Representation**: The combination of Neo4j graphs and vector stores demonstrates sophisticated thinking about knowledge representation trade-offs. Procedural knowledge with defined steps and conditions suits graph representation, while open-ended documentation suits vector retrieval. Many production systems default entirely to vector approaches, missing opportunities for more appropriate representations. **Action Tag Architecture**: The structured action tag system that bridges conversational and transactional flows represents practical engineering for production customer service. Many chatbot implementations struggle to move from dialogue to action execution; this architecture makes transactional capability first-class. **Business Stakeholder Integration**: The automated evaluation system generates outputs specifically designed for business stakeholder review, with explanations and guideline violations. This demonstrates understanding that production LLM systems require business stakeholder engagement, not just technical monitoring. **Separation of Customer-Facing and Internal Systems**: Rather than building one system for all use cases, Fastweb / Vodafone developed distinct architectures optimized for different contexts (Super TOBi vs. Super Agent). This reflects mature thinking about different risk profiles, performance requirements, and user needs. ## Conclusion This case study presents a comprehensive production deployment of multi-agent LLM systems in telecommunications customer service, demonstrating mature LLMOps practices including sophisticated monitoring, automated evaluation, and continuous improvement processes. The architecture showcases advanced patterns (Supervisor, LLM Compiler, hybrid knowledge representation) adapted for domain-specific requirements. The business results—serving 9.5 million customers with 90% correctness and 82% resolution rates—suggest meaningful production value, though the metrics would benefit from more transparent methodology and baseline comparisons. The human-in-the-loop design for Super Agent represents pragmatic risk management, while Super TOBi's autonomous operation demonstrates confidence in specific use cases. The implementation's complexity reflects the genuine difficulty of production customer service AI, but also creates operational overhead that organizations considering similar approaches should carefully evaluate. The tight integration with LangChain/LangGraph/LangSmith ecosystem delivers clear benefits but also creates vendor dependencies worth considering in strategic decisions. Overall, this represents a substantive example of production LLM deployment with meaningful scale, sophisticated architecture, and operational rigor, offering valuable insights for organizations building similar systems while acknowledging the inherent trade-offs and complexities involved.

Start deploying reproducible AI workflows today