ZenML

AI-Powered Network Operations Assistant with Multi-Agent RAG Architecture

Swisscom 2025
View original source

Swisscom, Switzerland's leading telecommunications provider, developed a Network Assistant using Amazon Bedrock to address the challenge of network engineers spending over 10% of their time manually gathering and analyzing data from multiple sources. The solution implements a multi-agent RAG architecture with specialized agents for documentation management and calculations, combined with an ETL pipeline using AWS services. The system is projected to reduce routine data retrieval and analysis time by 10%, saving approximately 200 hours per engineer annually while maintaining strict data security and sovereignty requirements for the telecommunications sector.

Industry

Telecommunications

Technologies

Overview

Swisscom’s Network Assistant represents a comprehensive LLMOps implementation designed to transform network operations through AI-powered automation. As Switzerland’s leading telecommunications provider, Swisscom faced the challenge of network engineers spending more than 10% of their time on manual data gathering and analysis from multiple disparate sources. The solution leverages Amazon Bedrock as the foundation for a sophisticated multi-agent system that combines generative AI capabilities with robust data processing pipelines to deliver accurate and timely network insights.

Technical Architecture and Evolution

The solution architecture evolved through several iterations, demonstrating the iterative nature of LLMOps development. The initial implementation established basic RAG functionality using Amazon Bedrock Knowledge Bases, where user queries are matched with relevant knowledge base content through embedding models, context is enriched with retrieved information, and the LLM produces informed responses. However, the team discovered that this basic approach struggled with large input files containing thousands of rows with numerical values across multiple parameter columns, highlighting the complexity of implementing LLMs for technical, data-heavy use cases.

The architecture evolved to incorporate a multi-agent approach using Amazon Bedrock Agents, featuring three specialized components:

Data Pipeline and Processing

A critical aspect of the LLMOps implementation is the sophisticated ETL pipeline that ensures data accuracy and scalability. The system uses Amazon S3 as the data lake with daily batch ingestion, AWS Glue for automated data crawling and cataloging, and Amazon Athena for SQL querying. This serverless architecture represents a significant technical advancement where the calculator agent translates natural language user prompts into SQL queries and dynamically selects and executes relevant queries based on input parameter analysis.

The evolution from initial Pandas or Spark data processing to direct SQL query execution through Amazon Bedrock Agents demonstrates the importance of finding the right balance between AI model interpretation and traditional data processing approaches. This hybrid approach facilitates both accuracy in calculations and richness in contextual responses, addressing a common challenge in LLMOps where pure LLM-based solutions may struggle with precise numerical computations.

Security and Compliance Implementation

The implementation showcases sophisticated approaches to data security and compliance in LLMOps, particularly relevant for telecommunications where data sovereignty requirements are stringent. The system implements comprehensive guardrails through Amazon Bedrock, including content filters that block harmful categories such as hate, insults, violence, and prompt-based threats like SQL injection. The security framework includes specific filters for sensitive telecommunications identifiers (IMSI, IMEI, MAC addresses, GPS coordinates) through manual word filters and regex-based pattern detection.

The team conducted a thorough threat model evaluation following the STRIDE methodology (Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege), creating detailed data flow diagrams and establishing trust boundaries within the application. This comprehensive security approach demonstrates best practices for LLMOps implementations in regulated industries, where data protection and compliance are paramount.

Performance and Scalability Considerations

The serverless architecture choice proves particularly beneficial for LLMOps deployment, minimizing compute resource management while providing automatic scaling capabilities. The pay-per-use model of AWS services helps maintain low operational costs while ensuring high performance, addressing common concerns about LLMOps cost management. The system integrates with Swisscom’s on-premises data lake through daily batch data ingestion, demonstrating how cloud-based LLMOps solutions can effectively interface with existing enterprise infrastructure.

Evaluation and Impact Assessment

The implementation includes robust evaluation mechanisms, with the system implementing contextual grounding and relevance checks to verify that model responses are factually accurate and appropriate. The projected benefits include a 10% reduction in time spent on routine data retrieval and analysis tasks, translating to approximately 200 hours saved per engineer annually. The financial impact shows substantial cost savings per engineer with operational costs at less than 1% of total value generated, demonstrating strong ROI characteristics typical of successful LLMOps implementations.

Deployment and Operational Considerations

The team’s adoption of infrastructure as code (IaC) principles through AWS CloudFormation demonstrates mature LLMOps practices, enabling automated and consistent deployments while providing version control of infrastructure components. This approach facilitates easier scaling and management of the Network Assistant solution as it grows, addressing common challenges in LLMOps around deployment consistency and scalability.

Future Enhancements and Lessons Learned

The roadmap includes implementing a network health tracker agent for proactive monitoring, integration with Amazon SNS for proactive alerting, and expansion of data sources and use cases. Key lessons learned include the importance of addressing data sovereignty requirements early in the design process, the need for hybrid approaches that combine AI model interpretation with traditional data processing for numerical accuracy, and the benefits of serverless architectures for LLMOps implementations.

The team’s experience highlights that complex calculations involving significant data volume management require different approaches than pure AI model interpretation, leading to their enhanced data processing pipeline that combines contextual understanding with direct database queries. This insight is particularly valuable for organizations implementing LLMOps in technical domains where precision and accuracy are critical.

Industry-Specific Considerations

The telecommunications industry context provides valuable insights into LLMOps implementation in regulated environments. The solution addresses sector-specific challenges around data classification, compliance with telecommunications regulations, and handling of sensitive network data. The threat modeling approach and comprehensive security framework serve as a model for other organizations operating in regulated industries considering LLMOps implementations.

The case study demonstrates how LLMOps can transform traditional engineering workflows while maintaining strict compliance and security requirements. The combination of Amazon Bedrock’s capabilities with careful attention to data security and accuracy shows how modern AI solutions can address real-world engineering challenges in highly regulated environments, providing a blueprint for similar implementations across the telecommunications sector and other infrastructure-intensive industries.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

AI-Powered Conversational Assistant for Streamlined Home Buying Experience

Rocket 2025

Rocket Companies, a Detroit-based FinTech company, developed Rocket AI Agent to address the overwhelming complexity of the home buying process by providing 24/7 personalized guidance and support. Built on Amazon Bedrock Agents, the AI assistant combines domain knowledge, personalized guidance, and actionable capabilities to transform client engagement across Rocket's digital properties. The implementation resulted in a threefold increase in conversion rates from web traffic to closed loans, 85% reduction in transfers to customer care, and 68% customer satisfaction scores, while enabling seamless transitions between AI assistance and human support when needed.

customer_support chatbot question_answering +40

Domain-Specific AI Platform for Manufacturing and Supply Chain Optimization

Articul8 2025

Articul8 developed a generative AI platform to address enterprise challenges in manufacturing and supply chain management, particularly for a European automotive manufacturer. The platform combines public AI models with domain-specific intelligence and proprietary data to create a comprehensive knowledge graph from vast amounts of unstructured data. The solution reduced incident response time from 90 seconds to 30 seconds (3x improvement) and enabled automated root cause analysis for manufacturing defects, helping experts disseminate daily incidents and optimize production processes that previously required manual analysis by experienced engineers.

customer_support data_analysis classification +49