ZenML

Rapid Development and Deployment of Enterprise LLM Features Through Centralized LLM Service Architecture

PagerDuty 2023
View original source

PagerDuty successfully developed and deployed multiple GenAI features in just two months by implementing a centralized LLM API service architecture. They created AI-powered features including runbook generation, status updates, postmortem reports, and an AI assistant, while addressing challenges of rapid development with new technology. Their solution included establishing clear processes, role definitions, and a centralized LLM service with robust security, monitoring, and evaluation frameworks.

Industry

Tech

Technologies

Overview

PagerDuty, a SaaS platform focused on helping developers, DevOps teams, IT operators, and business leaders prevent and resolve incidents, embarked on an ambitious project to integrate generative AI capabilities into their product offering. Following the explosive adoption of ChatGPT in late 2022, the company set an aggressive goal: develop and deploy production-ready GenAI features within two months. Their motto was “think days, not weeks,” and this case study, presented by Irina (Staff Applied Scientist) and Saita (Senior ML Engineer), details how they achieved this through careful architectural decisions, defined processes, and cross-functional collaboration.

The GenAI Features Developed

PagerDuty released several GenAI features for Early Access:

Key Challenges Identified

The team identified several challenges inherent in rapid LLM feature development:

New Technology Learning Curve: The LLM landscape was (and continues to be) rapidly evolving. The team was learning about new models, libraries, and providers while simultaneously building production features. The speakers noted a feeling that “we are in testing as revolution is happening”—new cutting-edge research releases daily made the ground feel unstable beneath their feet.

Limited Planning Time: Moving fast meant evolving requirements, shifting priorities, and difficulty achieving alignment across multiple teams. Even in rapid development, they found that defining minimum required documentation (like design documents) was crucial.

Too Many Stakeholders: Managing multiple teams and stakeholders led to excessive meetings and documentation overhead that threatened to distract from actual development. Clear role definition became essential.

Concurrent Work Streams: Developing multiple GenAI features simultaneously risked code duplication and independent problem-solving across teams, leading to inefficiencies.

Performance Evaluation: Evaluating LLM-based features presented unique challenges around varying outputs, establishing reliable benchmarks, and ensuring consistent results both offline and online.

The Centralized LLM API Service

The architectural cornerstone of PagerDuty’s approach was a centralized LLM API service. This decision addressed multiple challenges simultaneously and became the key enabler for rapid feature deployment.

Design Requirements

The service was built with several key requirements focused on ethical and efficient LLM use:

Single Point of Access: A dedicated team with data science and ML engineering expertise manages LLM access for the entire company. This abstracts away LLM-related complexity (prompt design, model selection, hyperparameter tuning) from product teams who may lack this expertise. It also prevents redundant work where multiple teams solve the same LLM-related problems independently.

Easy Integration via API Endpoints: Product teams receive clean API endpoints, reducing their overhead so they can focus on building GenAI capabilities rather than wrestling with LLM complexity.

Flexibility and Reliability: The service supports easy switching between LLM providers and models based on requirements. Failover mechanisms ensure availability—if a primary provider goes down, the service falls back to alternatives. This was a prescient design decision given ongoing concerns about LLM provider reliability.

Security and Legal Compliance: Working with security and legal teams from the start, they identified high-risk security threats including unauthorized use, prompt injection attacks, and potential data leakage. Mandatory security controls were implemented.

Continuous Monitoring: Given the novelty of the technology, extensive visibility into production behavior was essential. Clear definitions around what, how, and where logging occurs—plus access controls—were established with security compliance in mind.

Technical Architecture

The service runs on Kubernetes, providing robustness and scalability while enabling rapid deployment of new endpoints without worrying about infrastructure concerns. The microservices architecture allows the choice of LLM provider and model to be decoupled from the service itself—product teams can switch between providers and models by simply passing a different prompt version in the API call.

When an API call is made to the service:

The service supports multiple LLM providers including OpenAI, Azure OpenAI, and AWS Bedrock, with architecture designed to support future options like self-hosted models and other enterprise APIs.

Monitoring and Observability

The monitoring strategy involves multiple tools:

The speakers acknowledged that while their current solution works well, they remain open to integrating third-party services for advanced LLM monitoring as needs evolve.

Process and Role Definition

Successfully deploying GenAI features quickly required clear processes and role definitions across multiple teams: applied scientists, ML engineers, product managers, and product development engineers.

The LLM Intake to Production Process

Feasibility Evaluation: When a new LLM use case emerges (typically from product managers), an applied scientist evaluates whether an LLM is the appropriate solution. The speakers emphasized that “not everything should be solved with LLMs”—the team experienced pressure to apply LLMs to every problem, but simpler techniques are often more appropriate. Data requirements, model selection, and testing plans are defined here, along with success criteria.

Design Documentation and Security Review: If feasible and prioritized, a design document ensures alignment, and the security team reviews the use case early. Waiting until late in the process to involve security creates blocking risks.

Endpoint Development: ML engineers add new endpoints to the LLM API service, potentially implementing enhancements like support for multiple concurrent LLM calls or data preprocessing for complex use cases.

Iterative Development: Applied scientists develop endpoint logic while product engineers build user-facing features, fetch required data, and integrate with new endpoints. Teams start with the simplest approach and iterate through performance evaluation cycles until success criteria are met.

Deployment and Monitoring: All teams come together for deployment, then continue monitoring and gathering user feedback for future improvements.

The team emphasizes that this process continues to evolve as they learn, but having it in place was critical for overcoming coordination challenges.

Performance Evaluation and Risk Management

Evaluation Techniques

The team employed multiple evaluation approaches:

Risk Management Approach

Risk management followed a structured methodology:

The team acknowledged that security is an ongoing process requiring continuous assessment of new and emerging risks such as LLM bias and hallucination.

Security Controls

Specific security measures implemented include:

Key Takeaways

The speakers concluded with practical advice for organizations undertaking similar initiatives:

The case study represents a practical example of how a company navigated the early, chaotic period of enterprise GenAI adoption with a combination of architectural foresight (the centralized LLM API service), process discipline (defined intake-to-production workflows), and pragmatic evaluation approaches. While the speakers present their approach as successful, they also acknowledge ongoing challenges around evaluation, security, and the rapidly evolving LLM landscape—reflecting the reality that LLMOps remains an iterative, continuously improving discipline.

More Like This

Migration of Credit AI RAG Application from Multi-Cloud to AWS Bedrock

Octus 2025

Octus, a leading provider of credit market data and analytics, migrated their flagship generative AI product Credit AI from a multi-cloud architecture (OpenAI on Azure and other services on AWS) to a unified AWS architecture using Amazon Bedrock. The migration addressed challenges in scalability, cost, latency, and operational complexity associated with running a production RAG application across multiple clouds. By leveraging Amazon Bedrock's managed services for embeddings, knowledge bases, and LLM inference, along with supporting AWS services like Lambda, S3, OpenSearch, and Textract, Octus achieved a 78% reduction in infrastructure costs, 87% decrease in cost per question, improved document sync times from hours to minutes, and better development velocity while maintaining SOC2 compliance and serving thousands of concurrent users across financial services clients.

document_processing question_answering summarization +45

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building Production-Ready AI Agent Systems: Multi-Agent Orchestration and LLMOps at Scale

Galileo / Crew AI 2025

This podcast discussion between Galileo and Crew AI leadership explores the challenges and solutions for deploying AI agents in production environments at enterprise scale. The conversation covers the technical complexities of multi-agent systems, the need for robust evaluation and observability frameworks, and the emergence of new LLMOps practices specifically designed for non-deterministic agent workflows. Key topics include authentication protocols, custom evaluation metrics, governance frameworks for regulated industries, and the democratization of agent development through no-code platforms.

customer_support code_generation document_processing +41