At Cisco, the challenge of integrating LLMs into enterprise-scale applications required developing new DevSecOps workflows and practices. The presentation explores how Cisco approached continuous delivery, monitoring, security, and on-call support for LLM-powered applications, showcasing their end-to-end model for LLMOps in a large enterprise environment.
This case study comes from a presentation by John Rauser, Director of Engineering at Cisco, delivered at a Las Vegas conference in 2023. The talk focused on how large enterprises like Cisco are adapting their operational practices to accommodate the deployment and management of Large Language Models (LLMs) in production environments. While the source material is a conference talk abstract rather than a detailed technical document, it provides valuable insight into the enterprise perspective on LLMOps challenges and approaches.
Cisco, as a major technology company with extensive enterprise infrastructure experience, brings a unique perspective to the LLMOps discussion. The company has been launching AI-powered products and has developed internal practices for managing these new types of applications. The talk promises to walk through an end-to-end model for LLMOps, suggesting that Cisco has established a relatively mature framework for this purpose.
The core premise of this case study is that the advent of LLMs has created a fundamental shift in how software development and operations must function. Traditional DevOps and DevSecOps workflows, which have been refined over years of software engineering practice, do not translate directly to LLM-powered applications.
Several factors likely contribute to this incompatibility, though the source text does not enumerate them explicitly:
The presentation outlines Cisco’s approach to addressing these challenges through a comprehensive LLMOps framework. While specific technical details are not provided in the abstract, the talk covers several key operational areas:
Continuous delivery in the LLM context likely involves establishing pipelines that can handle not just code changes but also model updates, prompt modifications, and configuration changes. This requires new tooling and processes that can validate LLM behavior before deployment, potentially including automated evaluation suites that test model outputs against expected behaviors and safety criteria.
In enterprise settings like Cisco, this probably includes multiple staging environments where LLM applications can be tested with representative workloads before reaching production. The challenge is ensuring that testing is comprehensive enough to catch issues without being so slow that it impedes the development velocity that modern enterprises require.
Monitoring LLM-powered applications requires thinking beyond traditional metrics like uptime and response time. While these remain important, LLM applications also need monitoring for:
For a company like Cisco with enterprise customers, monitoring likely also includes audit trails and logging that meet compliance requirements for regulated industries.
Security is a particular focus of the talk, reflecting the DevSecOps framing. LLM security encompasses multiple dimensions:
In an enterprise context, these security measures must integrate with existing identity management, network security, and compliance frameworks.
The mention of “go on-call” for LLM applications is particularly interesting as it acknowledges that LLM applications require operational support just like any other production system, but with unique characteristics. On-call engineers for LLM applications need to be prepared for:
This requires new runbooks, training, and potentially new tooling to help on-call engineers diagnose and resolve LLM-specific issues.
The abstract mentions that the talk draws from “recent product launches involving AI at Cisco.” While specific products are not named, this suggests that Cisco has practical, hands-on experience deploying LLM-powered applications and has learned lessons from those experiences. This practical grounding is valuable because it means the proposed LLMOps model is not purely theoretical but has been tested against real-world constraints and challenges.
However, it should be noted that the source material is a conference talk abstract, which by its nature is promotional and highlights successes rather than failures or ongoing challenges. A balanced assessment would recognize that while Cisco has made progress in this area, LLMOps remains an evolving discipline and even large enterprises are still learning and adapting their approaches.
The talk specifically addresses “LLMOps in the large enterprise,” which brings particular considerations:
While this case study provides a useful high-level framework for thinking about LLMOps in enterprise settings, several limitations should be acknowledged:
Despite these limitations, this case study is valuable for highlighting the enterprise perspective on LLMOps and for framing the key operational challenges that organizations face when deploying LLMs at scale. The emphasis on security (the “Sec” in DevSecOps) is particularly relevant given the ongoing concerns about LLM safety and the potential for misuse.
Cisco’s approach to LLMOps, as outlined in this presentation, represents an attempt to bring the discipline and rigor of enterprise DevSecOps to the new world of LLM-powered applications. By addressing continuous delivery, monitoring, security, and on-call operations, Cisco is working to create a comprehensive framework that can support AI applications in production enterprise environments. While the specifics of their implementation are not detailed in the available source material, the framework itself provides a useful reference point for other organizations facing similar challenges.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
A comprehensive overview of how enterprises are implementing LLMOps platforms, drawing from DevOps principles and experiences. The case study explores the evolution from initial AI adoption to scaling across teams, emphasizing the importance of platform teams, enablement, and governance. It highlights the challenges of testing, model management, and developer experience while providing practical insights into building robust AI infrastructure that can support multiple teams within an organization.
Roblox has implemented a comprehensive suite of generative AI features across their gaming platform, addressing challenges in content moderation, code assistance, and creative tools. Starting with safety features using transformer models for text and voice moderation, they expanded to developer tools including AI code assistance, material generation, and specialized texture creation. The company releases new AI features weekly, emphasizing rapid iteration and public testing, while maintaining a balance between automation and creator control. Their approach combines proprietary solutions with open-source contributions, demonstrating successful large-scale deployment of AI in a production gaming environment serving 70 million daily active users.