## Overview
Zoom presented their AI Companion 3.0 system as a production-scale agentic AI implementation designed to transform meeting conversations into completed work. The presentation was delivered by Shi Zhen Wang, Head of AI Infrastructure at Zoom, and Dmitry, a Solutions Architect at AWS, at what appears to be Zoomtopia 2025. The case study represents an evolution from passive meeting summaries to proactive, multi-agent systems that can plan, reason, and execute tasks on behalf of users. While this is a promotional presentation from Zoom itself, it offers valuable insights into how a major communications platform approaches LLMOps challenges at scale, serving thousands to millions of customers.
The core problem Zoom identified was the gap between having conversations and achieving completion—users spend significant time in meetings but face substantial administrative overhead in coordinating follow-up actions, tracking decisions across multiple meetings, and ensuring distributed teams stay aligned. A specific example provided illustrated a project manager needing to diagnose project risks by searching through Jira tickets, scheduling urgent meetings across multiple time zones, and preparing focused agendas—tasks that would typically consume a full day of work.
## System Architecture and LLMOps Approach
Zoom built AI Companion 3.0 on a custom agentic framework designed with three fundamental architectural principles that directly address LLMOps concerns. The system is explicitly modular, allowing new capabilities to be added in a plug-and-play fashion without requiring complete system redesigns. This modularity extends to customizability, where templates and workflows can be defined for different verticals and domains, with built-in tenancy isolation for security requirements—a critical consideration for enterprise deployments.
The architecture leverages AWS microservices as its foundation, which was described as a deliberate decision to enable horizontal scaling across different regions, locations, and time periods to meet varying demand patterns. This microservices approach represents a mature LLMOps strategy that separates concerns and allows different components to scale independently based on actual usage patterns rather than requiring monolithic scaling.
## Federated AI Model Strategy
One of the most interesting technical decisions described in the presentation is what Zoom calls their "Federated AI" approach—their "secret sauce" according to Wang. This strategy combines proprietary small language models (SLMs) developed in-house at Zoom with frontier large language models from third-party providers accessed through AWS Bedrock. The rationale is explicitly about balancing quality, latency, and cost—three competing concerns that are central to production LLMOps.
The federated approach implements a multi-step processing pipeline where SLMs serve as an initial processing layer before requests reach larger models. Dmitry provided a specific example: when a user implicitly (rather than explicitly) requests output in a particular format like a table or structured list, the SLM layer first interprets this intent, processes it, and enriches the prompt before passing it to the LLM. This prompt enrichment strategy represents a sophisticated approach to prompt engineering at scale, where the system uses smaller, faster models to understand user intent and optimize prompts for more expensive LLM calls.
From an LLMOps perspective, this federated architecture addresses several production challenges. It provides cost control by routing simpler requests through less expensive SLMs while reserving LLM capacity for tasks requiring more sophisticated reasoning. It potentially reduces latency for common operations that SLMs can handle adequately. It also allows Zoom to maintain some control over model behavior through their proprietary SLMs while still accessing cutting-edge capabilities from third-party frontier models. However, the presentation doesn't detail how Zoom handles the orchestration complexity of this federated approach, such as how they determine routing decisions, handle failures or fallbacks, or maintain consistency across model types.
## Context Management and Retrieval Architecture
The system implements what Zoom describes as "advanced context engineering," where AI Companion considers not just session information but also long-term memory and personal preferences when executing tasks. The example provided showed the system learning that a particular user prefers 30-minute meetings and automatically selecting that duration when scheduling on their behalf.
The technical implementation of this context management relies on OpenSearch serving as the vector store for indexing conversations and retrieving relevant information. According to the presentation, this retrieval component is critical for enabling "agentic retrieval" where users don't manually search but instead ask AI Companion to summarize information by connecting to systems like Jira and retrieving related content. The system stores meeting transcripts, summaries, and presumably embeddings that enable semantic search across meeting history.
This represents a practical implementation of RAG (Retrieval Augmented Generation) patterns in production. The OpenSearch vector store allows the system to maintain context across multiple meetings and data sources while retrieving relevant information without latency issues. However, the presentation doesn't address important LLMOps concerns around this retrieval architecture, such as how they handle embedding model versioning, how they manage index updates as new meetings occur, what strategies they use for chunking long transcripts, or how they ensure retrieval quality and relevance at scale.
## Multi-Agent Workflow Design
AI Companion 3.0 is described as a "native multi-agent framework" that handles memory, context, prompt history, and other stateful components. The system implements specialized agents for different capabilities, including what was demonstrated in the interface: post-meeting follow-up generation, cross-meeting analysis, agentic scheduling, and meeting preparation.
The cross-meeting analysis capability demonstrates the complexity of the agentic workflow. Users can select multiple meetings from a series, and the system performs thread analysis across them—identifying recurring themes like "training program," "learning management system," "marketing campaign," etc., and providing crosscut analysis of topics discussed across different meetings. This requires not just summarization but synthesis and reasoning across multiple documents, tracking discussion evolution over time, and identifying patterns.
The agentic scheduling capability represents an even more complex workflow involving multiple steps: checking calendar availability, identifying conflicts, making decisions about priority (such as when explicitly instructed to "find time today and move other meetings if necessary"), rearranging the calendar, and securing meeting slots. This demonstrates AI agents taking actions in external systems (calendar systems) based on complex multi-constraint decision-making.
From an LLMOps perspective, these multi-agent workflows raise important questions about orchestration, error handling, and observability that the presentation doesn't fully address. How does Zoom handle failures in multi-step workflows? If the scheduling agent successfully identifies a time slot but fails to send meeting invitations, how is that detected and remediated? How do they monitor and debug complex agent interactions? The presentation emphasizes the modular design but doesn't detail the operational challenges of maintaining such systems in production.
## Model Selection and AWS Bedrock Integration
The system's integration with AWS Bedrock is highlighted as enabling access to different models so that "for a particular task, for a particular agent, the model is providing the best tool for the job." This suggests Zoom implements model routing based on task requirements, potentially using different models for summarization versus reasoning versus generation tasks.
However, the presentation doesn't provide specifics about which models are used for which tasks, how those decisions were made, or how they're maintained over time. In production LLMOps, model selection involves not just initial capability assessment but ongoing evaluation as models are updated or new models become available. The integration with Bedrock presumably provides flexibility to swap models, but the operational processes for making such changes aren't described.
The emphasis on Bedrock handling scalability concerns suggests Zoom relies on managed infrastructure rather than self-hosting models, which is a pragmatic LLMOps decision that trades some control for reduced operational complexity. This approach allows Zoom to focus on their application layer and agentic framework rather than managing model serving infrastructure, though it also creates dependencies on AWS availability and pricing.
## Inference Optimization and Cost Management
Wang's title as "Head of AI Infrastructure" specifically mentions "inference optimization to serve Zoom AI capability in a cost-effective way at scale," though the presentation provides limited detail on specific optimization techniques beyond the federated AI strategy. The emphasis on cost-effectiveness suggests Zoom has done significant work on making the system economically viable at their scale.
The federated approach itself is clearly a cost optimization strategy—using smaller, cheaper models when possible and reserving expensive LLM calls for when they're truly needed. The modular microservices architecture also enables cost optimization by allowing different components to scale independently based on demand rather than over-provisioning the entire system.
However, the presentation doesn't address other common inference optimization techniques that might be in use, such as batching requests, caching common responses, using quantized models, or implementing request throttling and rate limiting. For a system serving millions of customers, these optimizations would typically be critical to maintaining cost-effectiveness.
## Production Deployment and Scalability
The microservices architecture is explicitly designed for multi-region deployment, enabling Zoom to serve their global customer base with appropriate latency and availability. The system is described as able to "scale out to meet the demands" in "different regions, in different locations, or even for different times."
This suggests Zoom has implemented geographic distribution of their AI services, though the presentation doesn't detail how they handle model consistency across regions, whether they replicate the entire system or use some form of centralized model serving with distributed gateways, or how they manage the OpenSearch vector stores across geographic boundaries.
The mention of scaling for "different times" suggests they've observed and optimized for temporal patterns in usage—likely peak usage during business hours in different time zones. This kind of temporal scaling is a sophisticated LLMOps practice that requires good observability into usage patterns and automation to scale resources up and down appropriately.
## Evaluation and Quality Considerations
One significant gap in the presentation is discussion of evaluation and quality assurance. For an agentic AI system that takes actions on behalf of users—scheduling meetings, moving calendar items, sending emails—quality and reliability are critical. The presentation emphasizes providing "the highest quality to our customer" as a fundamental objective but doesn't describe how quality is measured or ensured.
In production LLMOps for agentic systems, this would typically involve extensive testing including unit tests for individual agents, integration tests for multi-agent workflows, evaluation datasets for different capabilities (summarization quality, action item extraction accuracy, etc.), and likely human evaluation processes for more subjective quality dimensions. The system presumably has guardrails to prevent inappropriate actions, but these aren't discussed.
The "show sources" feature demonstrated in the interface suggests some attention to explainability and transparency—users can see what context was used to generate outputs. This is valuable for building user trust and debugging issues, though the presentation doesn't detail how comprehensive this traceability is or whether it extends to showing which models were used, what prompts were constructed, or what reasoning steps were taken.
## Privacy and Security Considerations
The architecture description mentions "tenancy isolation" as a key principle, indicating Zoom has designed for multi-tenant security where different customers' data and interactions are isolated. This is essential for enterprise deployments where data privacy is paramount. However, the presentation doesn't elaborate on how this isolation is implemented—whether at the infrastructure level, the application level, or both.
For a meeting intelligence system that processes potentially sensitive business conversations, security and privacy are critical LLMOps concerns. The system presumably implements controls around data retention, encryption, access controls, and compliance with various regulatory requirements, but these aren't discussed in the presentation. The integration with AWS services likely provides some security infrastructure, but the application-level security design isn't detailed.
## Context Window Management and Prompt Engineering
The system's ability to perform cross-meeting analysis by selecting multiple meetings and processing them together suggests sophisticated handling of potentially very long contexts. Modern meetings can generate substantial transcript volumes, and analyzing multiple meetings together could easily exceed typical LLM context windows.
The presentation doesn't explain how Zoom handles this challenge. Possible approaches might include chunking and processing meetings in segments with synthesis afterward, using retrieval to identify only relevant portions of meetings for analysis, or leveraging long-context models where available. The federated architecture with SLMs preprocessing might also play a role in context management, but the specifics aren't described.
The "advanced context engineering" mentioned likely involves sophisticated prompt engineering practices, including prompt templates that can be customized for different verticals and domains, dynamic prompt construction based on available context and user preferences, and the multi-step prompt enrichment process described where SLMs enhance prompts before they reach LLMs. However, the presentation doesn't reveal the actual prompt engineering practices or how prompts are maintained and versioned in production.
## User Experience and Interface Design
The demonstrated web interface shows sophisticated user experience design that shields users from the underlying complexity. Users can select meetings from chat history, use quick prompt templates like "post-meeting follow-up" or "cross-meeting analysis," filter meetings by series, and review structured outputs with clear sections for summaries, action items, risks, participant notes, and even draft email communications.
From an LLMOps perspective, this interface design is important because it constrains the problem space—by providing structured templates and workflows rather than completely open-ended interaction, Zoom can optimize their system for specific use cases and provide more consistent quality. The quick prompts serve as a form of prompt engineering at the user interface level, guiding users toward interactions the system is designed to handle well.
The feature allowing users to view sources provides transparency, and the ability to edit prompts before submission gives users control while still benefiting from the templated starting points. These design choices reflect mature thinking about how to productionize AI capabilities in ways that are useful and trustworthy for users.
## Future Development and Technical Roadmap
The presentation mentions that AI Companion 3.0 is "a starting point" with more capabilities planned, and specifically notes exploration of "agent core," though no details are provided about what this means. This suggests ongoing development of the agentic framework, possibly toward even more autonomous agent capabilities or more sophisticated agent coordination.
The emphasis on the modular, extensible architecture suggests Zoom is building a platform that can evolve as LLM capabilities advance and new use cases emerge. The customizable workflow design mentioned earlier supports adaptation to different verticals and domains, indicating plans to expand beyond general meeting intelligence to more specialized applications.
## Critical Assessment and LLMOps Maturity
This case study presents Zoom's AI Companion 3.0 as a sophisticated production deployment of agentic AI, and the architectural decisions described—federated AI approach, microservices foundation, modular design, integration with managed services—reflect mature LLMOps thinking about building scalable, cost-effective systems. The specific examples provided, particularly the detailed walkthrough of using the system to diagnose project risks and coordinate urgent meetings, demonstrate real productivity value.
However, as a promotional presentation from Zoom itself, the case study naturally emphasizes successes and architectural strengths while glossing over challenges and limitations. Several important LLMOps concerns receive minimal or no attention: comprehensive evaluation and quality assurance processes, error handling and reliability in multi-step agentic workflows, observability and debugging of complex agent interactions, handling of edge cases and failure modes, prompt versioning and management, and the operational processes for maintaining and updating the system.
The federated AI approach is interesting but raises questions about operational complexity—managing two tiers of models, orchestrating between them, and maintaining consistency and quality across this pipeline adds complexity compared to using a single model approach. The claimed benefits around cost and latency are plausible but not substantiated with metrics.
The emphasis on scalability is appropriate for a company of Zoom's size, and the architectural decisions around microservices and managed services appear sound, but the presentation doesn't provide concrete information about actual scale metrics, performance characteristics, or cost efficiency achieved.
Overall, this represents a credible example of enterprise-scale LLMOps for agentic AI, with architectural decisions that address real production concerns around scalability, cost, and quality. However, the promotional nature of the presentation means it should be viewed as highlighting Zoom's approach and capabilities rather than as a comprehensive or fully transparent technical case study. The gaps in discussion around evaluation, reliability, and operational processes are understandable for a public presentation but leave important questions about the full LLMOps maturity of the implementation.