Company
Cires21
Title
AI-Powered Video Workflow Orchestration Platform for Broadcasting
Industry
Media & Entertainment
Year
2025
Summary (short)
Cires21, a Spanish live streaming services company, developed MediaCoPilot to address the fragmented ecosystem of applications used by broadcasters, which resulted in slow content delivery, high costs, and duplicated work. The solution is a unified serverless platform on AWS that integrates custom AI models for video and audio processing (ASR, diarization, scene detection) with Amazon Bedrock for generating complex metadata like subtitles, highlights, and summaries. The platform uses AWS Step Functions for orchestration, exposes capabilities via API for integration into client workflows, and recently added AI agents powered by AWS Agent Core that can handle complex multi-step tasks like finding viral moments, creating social media clips, and auto-generating captions. The architecture delivers faster time-to-market, improved scalability, and automated content workflows for broadcast clients.
## Overview and Business Context Cires21 is a Spanish company that has been providing live streaming services for nearly 20 years, primarily serving broadcasters in Spain. About two years ago, the company began developing its own AI pipelines after observing that their broadcast clients were struggling with a highly fragmented ecosystem of applications for their regular operations. This fragmentation led to three critical business problems: slow content delivery, high operational costs, and significant duplicated work across different tools and workflows. To address these challenges, Cires21 developed MediaCoPilot, a unified platform that integrates multiple AI capabilities into a single orchestrated system. The platform is presented as delivering faster content delivery and lower costs compared to the previous fragmented approach. The broader industry context, as explained by the AWS representative Stefano Sandrini, reveals that media and entertainment companies are undergoing significant digital transformation. Key industry trends include the need to reinvent monetization across all channels, enhance customer experiences through personalization, and leverage data analytics for targeted advertising and customer acquisition. Additionally, there's strong industry focus on five key AI use cases: archive and restoration, enhanced broadcasting, localization for market penetration (automatic dubbing and closed captions), hyperpersonalization, and video semantic understanding for content search and repurposing. ## Technical Architecture and Infrastructure MediaCoPilot is built on a serverless architecture on AWS, which Cires21 chose specifically for faster time-to-market, reliability, and scalability. The core infrastructure includes several key AWS services working in concert. The API layer is built using AWS Lambda and API Gateway, while AWS Step Functions serves as the central orchestrator for all workflows within the platform. This serverless approach represents a strategic decision to avoid the operational overhead of managing infrastructure while maintaining the ability to scale elastically based on demand. For authentication and security, the platform uses Amazon Cognito, which provides two-factor authentication capabilities. Content delivery and protection are handled through Amazon CloudFront and S3, providing worldwide distribution capabilities while maintaining security controls. The serverless architecture choice is significant from an LLMOps perspective because it allows the platform to handle variable workloads without pre-provisioning resources, which is particularly important given the unpredictable nature of video processing and AI inference workloads. ## Media Processing Capabilities AWS Media Services form a critical foundation of the platform's video processing capabilities. MediaCoPilot uses AWS Elemental MediaConvert to transcode uploaded assets into formats suitable for processing by AI models, specifically converting to Common Media Application Format (CMAF). For live streaming scenarios, the platform leverages AWS Elemental MediaLive and MediaPackage to receive live feeds, and uses harvest jobs to perform live clipping from ongoing streams. On top of these media services, Cires21 has developed a custom video editor that enables live clipping creation and provides capabilities for adding subtitles, branding, and styling to video content. This editor serves as a user-facing tool while the underlying platform handles the heavy lifting of video processing and AI inference. ## Custom AI Models and Processing Pipelines Cires21 has developed custom AI pipelines that process both audio and video content. The audio processing capabilities include automatic speech recognition (ASR), voice activity detection, and speaker diarization. For video, the platform includes scene detection capabilities. These custom models are deployed on Amazon SageMaker, which provides the managed infrastructure for hosting and running inference. A critical lesson learned by Cires21 in their deployment journey concerns the selection of appropriate SageMaker deployment models. The team discovered that synchronous endpoints work well for video-on-demand (VOD) use cases but are insufficient for live processing scenarios. For live use cases, they recognized the need to use real-time or serverless inference endpoints instead. This represents an important architectural consideration for LLMOps practitioners: the deployment mode must match the latency and throughput requirements of the specific use case. Another optimization strategy the team is implementing is segmented video processing. By breaking videos into segments and processing them in parallel, they can significantly reduce overall inference time. This approach represents a practical LLMOps pattern for handling large media files where sequential processing would create unacceptable delays. ## Integration with Foundation Models via Amazon Bedrock While Cires21 built custom models for lower-level audio and video processing tasks, they integrate Amazon Bedrock for generating more complex, higher-level metadata. Specifically, Bedrock is used to create subtitles, highlights, summaries, and other semantic content based on the outputs from the custom models. This represents a hybrid architecture approach that's becoming increasingly common in LLMOps: using specialized custom models for domain-specific tasks while leveraging large foundation models for general language understanding and generation tasks. The integration of Bedrock suggests that the platform takes advantage of the managed inference capabilities and the variety of foundation models available through the service, allowing Cires21 to focus on their differentiation in video workflow orchestration rather than managing LLM infrastructure. ## API-First Integration Strategy A noteworthy aspect of MediaCoPilot's deployment is that most clients use the API rather than the user interface. Clients integrate MediaCoPilot directly into their existing systems, including content management systems (CMS), media asset management platforms, and live clipping services. In this integration pattern, MediaCoPilot acts as an AI layer that augments existing workflows rather than requiring clients to adopt entirely new systems. This API-first approach has significant implications for LLMOps. It requires robust API design, reliable service availability, comprehensive documentation, and careful version management. The platform must handle authentication, rate limiting, error handling, and monitoring at scale across multiple client integrations. The fact that clients prefer API integration over UI suggests that the platform has successfully achieved a level of reliability and ease of integration that makes it viable as a critical component in production workflows. ## AI Agent Development with AWS Agent Core Approximately two months before the presentation, Cires21 began developing AI agents for MediaCoPilot using AWS Agent Core, which had recently been released. Agent Core provides several key services that are valuable for agent deployment: runtime and gateway, identity management, observability, and memory capabilities. The team found these features to be scalable, secure, and extensible, making the service well-suited to their needs. The team's first step in agent development was implementing an MCP (Model Context Protocol) server. An MCP server provides tools—discrete capabilities that allow agents to connect to different elements like databases, APIs, and code execution environments. Cires21 used Agent Core Gateway to deploy their MCP server, which significantly accelerated development. The Gateway service accepts an OpenAPI specification of an API and automatically creates all the tools available for applications and agents, reducing deployment time from hours to minutes. This represents an important accelerator for LLMOps, as it reduces the engineering effort required to expose existing functionality to AI agents. ## Agent Architecture and Stateless Design The agents themselves are built using Agent Core Runtime, which has a stateless architecture that the team finds particularly valuable. For each user session, Agent Core deploys specific resources encapsulated with that session's context. This stateless, session-isolated design is critical for Cires21 because content protection and privacy are paramount concerns in their industry. By ensuring that context doesn't leak between sessions and that resources are cleanly isolated, the architecture provides stronger security guarantees. The agents can handle complex multi-step tasks that would previously require manual work or multiple separate tool invocations. A concrete example provided is an agent handling the request: "find the best moment or the most viral moment of an interview, create a vertical clip for social media, add subtitles, and export everything with captions to external metadata automatically." This demonstrates the practical value of agents in orchestrating multiple AI capabilities and workflow steps based on natural language instructions. ## Memory Management for Agents Agent Core's memory service provides both short-term and long-term memory capabilities. Short-term memory allows agents to maintain context within a session, tracking what's been discussed and decided. Long-term memory enables the recovery of context from past sessions and, importantly, allows the storage of user preferences. Cires21 uses long-term memory to store preferences related to styling, text generation, and other customizable aspects of content creation. This memory architecture addresses a critical challenge in LLMOps for agent systems: maintaining personalization and learning from user interactions without compromising session isolation and privacy. By explicitly separating short-term session context from long-term user preferences, the architecture provides a clear model for managing state in production agent systems. ## Observability and Monitoring Agent Core's observability service plays an important role in Cires21's operations, allowing the team to monitor everything happening within agent sessions. This visibility enables them to identify bottlenecks in workflow execution and optimize performance. From an LLMOps perspective, observability is crucial for understanding agent behavior in production, debugging failures, identifying performance issues, and providing transparency into the decision-making process of autonomous systems. The importance Cires21 places on observability suggests they've recognized that deploying agents to production requires more than just functional capabilities—it requires comprehensive instrumentation to maintain service quality and continuously improve performance. ## Lessons Learned and Evolution of Approach The team explicitly shared several lessons learned from their development journey. Beyond the SageMaker deployment model considerations mentioned earlier, a significant insight relates to agent design. They discovered that using fewer tools per agent results in lower token consumption and more efficient operation. This led them to favor specialized agents with focused capabilities over generalist agents with many tools. This finding has important implications for LLMOps practitioners designing agent systems: there's a tradeoff between agent versatility and efficiency, and in production systems where token costs and latency matter, specialization may be preferable. ## Future Directions and Ongoing Development Cires21 outlined several next steps for MediaCoPilot's evolution. Real-time processing for live content is a priority, enabling decision-making while events are actually happening rather than in post-production. This would extend the platform's capabilities from primarily VOD scenarios to real-time live broadcasting applications. The team is developing more specialized AI agents, following through on their insight about the benefits of focused agent capabilities. They're also integrating new models, particularly visual models, to add more context to agent operations. The addition of visual understanding capabilities would allow agents to make decisions based not just on audio transcripts and metadata but on the actual visual content of videos, enabling more sophisticated content analysis and clip selection. ## Critical Assessment and Considerations While the case study presents MediaCoPilot as a successful implementation, several considerations warrant attention. The presentation is fundamentally promotional, delivered at AWS re:Invent as a customer success story, which means some claims about improved speed and reduced costs aren't independently verified with specific metrics. The case study would be strengthened by concrete performance numbers, cost comparisons, and client satisfaction data. The reliance on relatively new services like AWS Agent Core, which was released only approximately two months before the team began using it, introduces some risk. While managed services reduce operational burden, they also create dependencies on vendor roadmaps and potential service limitations. The team's choice of a serverless, managed-service-heavy architecture makes sense for a small company wanting to focus on differentiation rather than infrastructure, but it does mean they have less control over the underlying systems. The segmented video processing approach they're implementing is promising for parallelization, but it introduces complexities around ensuring consistency across segments and properly handling content that spans segment boundaries. The effectiveness of this approach likely depends significantly on the specific types of analysis being performed. The agent architecture's emphasis on privacy and session isolation is commendable and appropriate for media content, but the actual implementation details are not provided. It's not clear, for example, how long-term memory storage is secured, how preferences are associated with users while maintaining privacy, or what guardrails exist to prevent agents from inadvertently exposing content from one client to another. The finding that specialized agents with fewer tools are more efficient is valuable, but it raises questions about how the system coordinates between multiple specialized agents, how users interact with them (do they need to know which agent to invoke, or is there a routing layer?), and whether this increases overall system complexity even as it reduces per-agent complexity. Overall, MediaCoPilot represents a practical implementation of LLMOps principles in the media and entertainment industry, combining custom models, managed AI services, serverless orchestration, and agentic workflows to solve real business problems for broadcast clients. The case study provides useful insights into deployment model selection, agent design tradeoffs, and the architectural patterns that enable API-first integration of AI capabilities into existing production workflows.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.