Consulting
PriceWaterhouseCooper
Company
PriceWaterhouseCooper
Title
AI Managed Services and Agent Operations at Enterprise Scale
Industry
Consulting
Year
2026
Summary (short)
PriceWaterhouseCooper (PWC) addresses the challenge of deploying and maintaining AI systems in production through their managed services practice focused on data analytics and AI. The organization has developed frameworks for deploying AI agents in enterprise environments, particularly in healthcare and back-office operations, using their Agent OS framework built on Python. Their approach emphasizes process standardization, human-in-the-loop validation, continuous model tuning, and comprehensive measurement through evaluations to ensure sustainable AI operations at scale. Results include successful deployments in healthcare pre-authorization processes and the establishment of specialized AI managed services teams comprising MLOps engineers and data scientists who continuously optimize production models.
## Overview This case study explores PWC's approach to AI managed services and the operationalization of AI systems, particularly AI agents, in enterprise production environments. Ronnie, a partner at PWC leading their data analytics and AI managed services practice, provides insights into how organizations are moving from AI experimentation to production deployment and the unique challenges of maintaining AI systems at scale. The discussion reveals a significant shift in the industry over the past 12-18 months, where organizations have transitioned from piloting AI to actively seeking ways to scale, sustain, and measure ROI from production deployments. ## The Evolution of AI Managed Services PWC's managed services practice represents a fundamental reimagining of traditional "keep the lights on" operations for AI systems. Unlike conventional managed services that focus primarily on break-fix scenarios and ensuring system uptime, AI managed services require a fundamentally different approach and skill set. The practice has evolved to include continuous model tuning, optimization, bias reduction, hallucination mitigation, and outcome measurement as core activities rather than occasional interventions. The team composition reflects this shift: PWC is hiring MLOps engineers, data scientists, and advanced analytics professionals rather than traditional managed services personnel. These team members are responsible not just for maintaining system availability but for ensuring that AI systems continue to deliver on their intended business outcomes over time. This represents a significant departure from traditional IT operations and reflects the unique nature of AI systems that can "fail silently" in ways that aren't captured by traditional monitoring approaches. ## AI Operations and MLOps in Practice The organization has identified several key areas where AI and machine learning operations provide significant value in IT operations contexts. One critical application is incident detection and alert management. Modern organizations generate enormous volumes of alerts, creating significant alert fatigue for operations teams. By training models to identify which alerts represent genuine incidents requiring attention, organizations can dramatically reduce the noise and focus human attention on the most critical issues. This approach enables teams to triage more effectively and reduce mean time to resolution. Another powerful application is proactive monitoring and predictive operations. Many infrastructure patterns are cyclical, with resource utilization following predictable patterns over time. By monitoring for these patterns, AI systems can predict when resources like CPU or memory will reach critical thresholds hours or even minutes in advance, enabling preemptive scaling or resource allocation. This shifts operations from reactive to proactive, preventing incidents before they occur rather than simply responding to them quickly. ## Agent Framework and Architecture PWC has developed Agent OS, a proprietary framework for building and orchestrating AI agents in enterprise environments. The framework is built on Python-based technologies and designed with interoperability as a core principle. It provides APIs to connect with any ecosystem and works seamlessly with major hyperscaler cloud platforms. The framework includes an agent orchestration layer that enables organizations to manage multiple agents working in concert rather than operating in isolation. The strategic rationale behind providing a framework rather than building from scratch is to enable clients to focus on business-specific outcomes rather than underlying technology implementation. This approach accelerates time to value and leverages existing best practices rather than requiring each organization to solve foundational problems independently. The framework approach also facilitates governance and standardization across agent deployments within an organization. ## Agent Development and Deployment Lifecycle The organization has identified a distinctive development lifecycle for AI agents that differs significantly from traditional software development. The process begins with significant time investment in design, focusing on understanding the process to be automated and defining the desired outcomes. The actual coding phase represents the smallest time investment, particularly given the availability of low-code and no-code development environments. The majority of time after coding is spent on evaluations and measurement, ensuring the agent performs as intended and doesn't exhibit undesired behaviors like hallucinations or biases. This lifecycle reflects a fundamental shift in where technical effort is required. Traditional software development placed heavy emphasis on coding expertise, while agent development emphasizes design thinking and analytical capabilities. The democratization of coding through natural language interfaces and development environments means that product owners and business stakeholders can increasingly participate directly in development rather than solely relying on engineering teams. ## Scaling Strategies and Governance PWC advocates for a specific approach to scaling agent deployments that balances centralization with parallel development. The organization recommends centralizing agent management and governance to prevent fragmentation where different teams use different technologies and approaches without coordination. This centralization enables consistent application of responsible AI principles and ethical considerations across all agent deployments. However, within this centralized governance framework, PWC supports parallel development of multiple agents targeting different processes simultaneously. The rationale is that modern development tools make agent creation relatively quick, so organizations shouldn't artificially constrain themselves to sequential development when they've identified multiple high-value opportunities. The key is identifying processes that are used frequently (the example given is processes used 75% of the time) to maximize impact from automation efforts. Process standardization emerges as a critical prerequisite for scaling. If different teams or departments execute the same logical process in different ways, it becomes extremely difficult to create an agent that works across all variations. Organizations must invest in standardizing processes before they can effectively scale agent deployments across business units or geographies. ## Human-in-the-Loop Models The case study introduces a taxonomy of human involvement in agent operations that reflects different levels of automation maturity and risk tolerance. Human-in-the-loop represents the most conservative approach where humans must approve all agent outputs before they take effect. Human-on-the-loop involves humans approving only exceptions while the agent handles routine cases autonomously. Human-out-of-the-loop represents fully autonomous operation without human intervention. The choice between these models varies significantly by industry and use case. Highly regulated industries, particularly healthcare and financial services, tend toward human-in-the-loop models even for mature agents. This ensures that critical decisions affecting health outcomes or financial impacts always receive human review. Less regulated domains or lower-stakes processes may progress more quickly to human-on-the-loop or fully autonomous operation. The case study acknowledges a fundamental tension in these models: if humans remain fully in the loop, the productivity gains from automation may be limited. This creates pressure to move toward more autonomous models while balancing risk considerations. Organizations must carefully evaluate where on this spectrum each use case falls based on regulatory requirements, risk tolerance, and process maturity. ## Healthcare Applications Healthcare provides several concrete examples of AI agents in production. The pre-authorization process for medical procedures represents a particularly relevant use case. This process involves verifying insurance coverage before services are rendered and can be highly manual and time-intensive. AI agents can automate significant portions of this workflow, but human oversight remains critical, particularly for denials. If an AI system denies coverage, human review ensures the decision is appropriate and defensible. The Epic Systems electronic medical record platform has recently introduced AI capabilities that many large healthcare providers are adopting. One example is automated drafting of patient communications. Physicians spend considerable time drafting emails to patients, and AI can generate initial drafts that physicians then refine for tone and content. This represents a practical application that saves time while maintaining physician control over final communications. These healthcare examples illustrate an important principle: successful agent deployments require deep business context, not just technical implementation. An engineer looking at data flows might miss critical medical logic that a clinical professional would immediately recognize. This reinforces the importance of involving business stakeholders throughout the agent development and validation process. ## Evaluation and Measurement Continuous evaluation emerges as perhaps the most critical aspect of maintaining AI systems in production. PWC emphasizes "measure, measure, measure" as a core principle. Evaluations serve multiple purposes: validating that agents produce desired outputs, identifying biases or hallucinations, and establishing baselines for measuring improvement over time. The case study acknowledges that while various evaluation tools exist, the fundamental bottleneck is often human data labeling. Business stakeholders must review agent interactions and label them as correct or incorrect, appropriate or inappropriate. This human labeling provides ground truth for automated evaluation systems and helps identify edge cases or failure modes that weren't anticipated during development. The process is time-intensive but essential for ensuring agent quality. Establishing clear baselines before agent deployment is critical for measuring impact. Organizations must understand current process performance—whether measured in time, cost, quality, or outcomes—to quantify improvement after agent deployment. Without these baselines, claims about productivity gains or ROI become difficult to substantiate. ## ROI and Economic Sustainability The case study provides a nuanced perspective on measuring ROI for AI systems that goes beyond simplistic cost displacement calculations. A naive approach might calculate ROI as the cost of human workers replaced by agents minus the cost of the agents. However, the true calculation is far more complex and must include several factors that are often overlooked. First, there are costs associated with developing and deploying the agents themselves, including infrastructure, tooling, and development time. Second, human-in-the-loop review has ongoing costs that must be factored in—the agents don't eliminate human involvement, they shift it to different activities. Third, the infrastructure costs for model inference, retrieval systems, and data storage represent ongoing operational expenses that didn't exist in purely manual processes. Finally, the cost of continuous evaluation and model tuning represents a new category of expense. Organizations must also make deliberate choices about model size and efficiency. When a smaller model can achieve acceptable performance, running it instead of a larger model can significantly reduce operational costs. These optimization decisions become increasingly important as systems scale and handle higher volumes of requests. ## Environmental and Social Sustainability PWC frames sustainability along three dimensions that extend beyond typical environmental concerns. The environmental dimension includes energy consumption, carbon emissions, water usage (particularly for data center cooling), and hardware lifecycle management including e-waste. Organizations should deliberately select data centers that use recycled water and renewable energy where possible. The economic dimension encompasses the cost factors discussed above but also includes efficiency considerations: running appropriately-sized models, optimizing retrieval and storage, and making conscious tradeoffs between performance and cost. Not every application requires maximum accuracy or the most sophisticated model, and organizations should align technical choices with business requirements. The social dimension focuses on responsible AI, including bias mitigation, transparency, explainability, accessibility, and localization. This dimension also encompasses workforce considerations: how organizations upskill rather than replace employees, how they manage the cultural transition to AI-enabled work, and how they ensure that AI systems are accessible to diverse user populations. ## Data Modernization and Foundation Throughout the discussion, data quality and modernization emerge as critical prerequisites for successful AI deployment. No organization claims to have perfect data quality, and addressing data issues is essential before expecting AI systems to perform reliably. PWC sees significant data modernization and application modernization initiatives as organizations prepare their foundations for AI deployment. Context engineering and retrieval systems represent significant data engineering challenges in agent deployments. Agents require appropriate context to make good decisions, and providing that context requires moving and transforming data across systems. Data engineers play a critical role in building these context pipelines, even as other aspects of development become more accessible to non-technical users. ## Skills Evolution and Organizational Change The shift to AI operations is fundamentally changing organizational skill requirements and career paths. Traditional software development skills remain valuable but are no longer sufficient. Design thinking becomes more important as the design phase consumes more time relative to coding. Analytical skills for evaluation and measurement become critical. Business domain knowledge increases in value as organizations recognize that technical expertise alone cannot ensure successful AI deployments. This skills evolution extends to organizational structure. The emergence of Chief AI Officers reflects a shift toward business-oriented leadership of AI initiatives rather than purely technical leadership. Organizations are recognizing that AI strategy must be driven by business outcomes rather than technology capabilities, even as technical expertise remains essential for execution. Product owners and business stakeholders are becoming more directly involved in development as low-code and no-code tools reduce barriers to participation. This creates opportunities for closer alignment between technical implementation and business requirements but also requires organizational change management to help people adopt new ways of working. ## Change Management and Culture Cultural transformation emerges as perhaps the most significant challenge in scaling AI across organizations. PWC emphasizes that AI enablement requires significant investment in training and upskilling—one reference point mentioned is spending 4-8 hours per week on learning and development. Organizations that underinvest in this dimension are unlikely to realize the full potential of AI systems. The cultural challenge extends beyond individual learning to organizational attitudes toward AI. Organizations must cultivate a culture of learning and experimentation rather than fear of replacement. This requires transparent communication about how AI will augment rather than eliminate roles, concrete examples of how employees will be upskilled, and genuine commitment to supporting people through the transition. Starting AI initiatives in back-office functions rather than customer-facing applications provides a more forgiving environment for initial deployments. This allows organizations to build expertise and confidence before deploying AI in contexts where errors have more immediate external impact. As organizations mature, they can extend AI into core products and services with greater confidence. ## Practical Recommendations PWC's recommendations for organizations pursuing AI at scale center on several key principles. First, invest in foundational work: standardize processes, modernize data and applications, and establish governance frameworks before scaling agent deployments broadly. Second, start with back-office applications that provide learning opportunities in lower-stakes environments. Third, invest heavily in training and change management to build organizational capacity and acceptance. Fourth, leverage existing frameworks and tools rather than building everything from scratch—the ecosystem of available solutions can significantly accelerate time to value. Fifth, pilot multiple use cases in parallel rather than sequential development, provided centralized governance maintains consistency. Finally, maintain relentless focus on measurement and evaluation to ensure that deployed systems deliver intended outcomes and to build the evidence base for continued investment. The case study concludes with emphasis on the importance of feedback loops and continuous measurement. The iterative cycle of deploy, measure, adjust, and redeploy represents the operational reality of AI systems in production. Organizations that embrace this iterative approach and invest in the capabilities required for continuous improvement are positioned to realize sustainable value from AI investments over time.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.