Company
Bloomberg Media
Title
AI-Driven Media Analysis and Content Assembly Platform for Large-Scale Video Archives
Industry
Media & Entertainment
Year
2025
Summary (short)
Bloomberg Media, facing challenges in analyzing and leveraging 13 petabytes of video content growing at 3,000 hours per day, developed a comprehensive AI-driven platform to analyze, search, and automatically create content from their massive media archive. The solution combines multiple analysis approaches including task-specific models, vision language models (VLMs), and multimodal embeddings, unified through a federated search architecture and knowledge graphs. The platform enables automated content assembly using AI agents to create platform-specific cuts from long-form interviews and documentaries, dramatically reducing time to market while maintaining editorial trust and accuracy. This "disposable AI strategy" emphasizes modularity, versioning, and the ability to swap models and embeddings without re-engineering entire workflows, allowing Bloomberg to adapt quickly to evolving AI capabilities while expanding reach across multiple distribution platforms.
## Overview Bloomberg Media presented a comprehensive case study of building an end-to-end AI-driven media analysis and content creation platform at AWS re:Invent. The presentation, delivered by Bloomberg's Head of Media Technology Louis Barbeau alongside AWS specialist solutions architects Brandon Lindauer and Robert Raver, detailed Bloomberg's journey from experimental AI implementations to a production-grade platform handling 13 petabytes of video content that grows at 3,000 hours per day. Bloomberg Media serves 60 million unique viewers monthly with a reach of 437 million households globally across 48 streaming platforms, producing news articles, audio content, and video content including both live linear TV and video-on-demand. The business challenge centered on the need to analyze massive unstructured video archives, reduce time to market for content distribution, unlock new distribution targets with platform-specific formats, and create new types of stories by correlating real-time news with historical content. Bloomberg emphasized three core requirements that must be met in every workflow decision: speed, accuracy, and trust. These requirements shaped their entire approach to implementing AI in production. ## The "Disposable AI Strategy" and Platform Architecture Philosophy A central theme of Bloomberg's approach is what they term a "disposable AI strategy" - the concept that before introducing any AI service or model, they think carefully about what will be required to remove it from the ecosystem. This philosophy arose from painful lessons learned during early experimentation where they discovered that tightly coupled AI implementations became extremely expensive and complicated to modify or replace when better models became available. The platform architecture prioritizes loose coupling, modularity, and the ability to swap components without re-engineering entire workflows. Bloomberg runs a hybrid global platform combining on-premises services with extensive AWS cloud services. They had previously invested heavily in creating a service-oriented ecosystem on AWS that provides the agility needed to handle changing business objectives and align distributions with ever-changing platforms. This foundation proved critical for building their AI platform, as it already embraced concepts of dynamic workflows and the ability to introduce new services relatively easily. ## Multi-Layered Analysis Approach The platform employs three distinct but complementary approaches to video analysis, each serving different purposes and addressing different aspects of understanding media content: **Task-Specific Models:** These are purpose-built models trained for specific extraction tasks such as transcription, label generation, face recognition, and structural breakdown (shot and scene detection). Bloomberg uses AWS services including Amazon Transcribe for audio-to-text conversion, Amazon Rekognition Video for visual analysis including celebrity recognition and custom face collections, and various other specialized models. These models excel at generating precise, structured metadata for predefined categories and are highly reliable for keyword-based searches. **Vision Language Models (VLMs):** This represents a newer generation of models that can natively process video, image, and audio modalities and respond in natural language. Unlike task-specific models that require training for specific outputs, VLMs support zero-shot reasoning - they can analyze media and answer questions without additional training. The example provided showed using AWS Bedrock's Nova model to describe "the second speaker in the video," which returned rich natural language descriptions like "the second speaker is a man with a beard wearing a light blue shirt and blue jeans" - context and temporal understanding that task-specific models cannot provide. VLMs understand temporal timelines, context across frames, and can reason about what's happening in scenes rather than just identifying objects. **Multimodal Embedding Models:** The third pillar involves transforming video, images, audio, and text into vector embeddings - numerical representations that capture semantic meaning in a way that enables similarity comparisons. Bloomberg uses models including 12 Labs Marengo 3.0 and AWS Nova multimodal embedding model to create these vectors. The critical insight is that embeddings allow cross-modal comparison - you can compare audio to video frames to text transcripts, all within the same semantic space. Embeddings enable finding content based on conceptual similarity rather than exact keyword matches, such as finding "a dog wearing a bandana on a beach" rather than just "dog." The platform architecture processes incoming media through all three analysis pathways simultaneously, creating a rich, multifaceted understanding of each asset. This comprehensive analysis forms the foundation for both search and content creation workflows. ## Versioning and Production/Non-Production Tiers A key innovation in Bloomberg's LLMOps approach is their extensive versioning system that operates at multiple levels throughout the platform. They version models, embeddings, services, workflow pipelines, and even prompts. When a query or prompt executes, it can explicitly target specific versions, allowing multiple versions to coexist in production simultaneously. This enables A/B testing, gradual rollouts, and the ability to revert quickly if a new model version performs poorly. Beyond versioning, Bloomberg introduced the concept of production and non-production tiers that operate within the same ecosystem. This means experimental workflows using newer models or different embedding approaches can run in parallel with production workflows, processing the same content but marked with different quality levels. Teams can experiment with new approaches, measure their performance against production baselines, and easily promote successful experiments to production status or demote underperforming configurations. This provides the "dynamicity and agility" needed to continuously improve the platform while maintaining service reliability. ## Federated Search and Intent Analysis One of the most sophisticated aspects of the platform is the federated search architecture that abstracts the complexity of multiple data stores and embedding types from the applications and agents that need to search content. Bloomberg stores different types of embeddings in multiple repositories - audio embeddings, video frame embeddings from vision models, text embeddings from transcripts - potentially created from multiple model versions. Rather than requiring applications to know which database to query, they built a unified search service that sits between the search interface and the various data stores. This search service incorporates an AI layer that analyzes incoming search queries to determine search intent before executing the query. Using an LLM with a carefully crafted prompt, the system analyzes search terms and assigns weights to different data sources based on what the query is really asking for. In the example provided, the query "Brandon on stage with two other people" resulted in the visual data store receiving a 0.9 weight because understanding "how many people" requires visual analysis, while audio received minimal weight. Conversely, the query "Robert talking about search techniques" resulted in high weights for transcript and people databases (to identify Robert) with lower weights for visual data. This intent analysis enables dynamic hybrid search that combines keyword search (using OpenSearch) with vector similarity search across multiple embedding databases. The system can weight keyword matches higher when searching for specific people or entities, while prioritizing vector similarity when the query describes concepts or actions. By analyzing intent and routing queries appropriately, Bloomberg achieves more relevant results than either keyword or vector search alone could provide. ## Knowledge Graphs for Contextual Understanding While hybrid search returns relevant assets, Bloomberg recognized that understanding relationships between assets is equally important. They implemented knowledge graphs using Amazon Neptune Analytics to create these relationships. The graph connects entities including people, assets, stories, places, events, and topics through typed relationships such as "appears_in," "created_from," "happens_at," and "discusses." The power of the knowledge graph approach becomes clear in scenarios where multiple assets are derived from the same source. If an original 30-minute interview is cut into seven different versions for different platforms (full-length YouTube, 2-minute broadcast version, 30-second social clips, etc.), the knowledge graph maintains these relationships. When search returns results, the system can identify that seven results are actually derived from the same source material and surface this information to users, avoiding redundant results. Bloomberg also introduced the concept of "virtual entities" in their graph - nodes that represent aggregated understanding about a person, topic, or event across many assets rather than individual assets themselves. For example, instead of just searching assets that contain a specific person, you can query the person entity itself to understand everything known about them: what assets they appeared in, what topics they discussed, how their presence evolved over time, relationships to other entities, and more. This entity-centric approach mirrors how humans naturally think about information and enables powerful queries like "who were returning speakers between two events" or "how did the discussion of a topic evolve between spring and fall summits." ## Agent-Based Content Assembly The platform's content creation capabilities center on orchestrated AI agents that automate the labor-intensive process of creating platform-specific content cuts from long-form source material. The example scenario involves taking a 30-minute interview with a prominent actor where only 2 minutes aired on broadcast TV. Traditionally, the remaining 28 minutes would be archived and rarely used. With the agent-based system, a producer can request "make this interview available to our social platforms" in natural language. The system deploys multiple specialized agents working in parallel and coordinating as needed: **Analysis and Summarization Agent:** This agent processes the source material using the comprehensive metadata generated during ingest, creating an understanding of the content structure, key topics, speakers, and segments. It understands the user's intent from the natural language request and determines what needs to be created. **Selection Agent(s):** These agents identify specific segments of the source video suitable for different target lengths and platforms. They can find 10-second clips, 30-second segments, or 2-minute sections based on content boundaries, topic completeness, and platform requirements. When multiple camera angles exist for the same interview (ingested as separate files but correlated through the knowledge graph), selection agents can choose the better angle for specific moments based on visual composition, speaker framing, and other factors. **Assembly Agent:** This agent takes selected segments and assembles them into cohesive pieces for each target platform. It handles technical requirements like aspect ratio conversion (16:9 for YouTube, 9:16 for TikTok, 1:1 for Instagram), applies platform-specific formatting, and can request additional content from selection agents if needed - for example, "I have 25 seconds assembled but my target is 30 seconds, find me 5 seconds of relevant B-roll." The agent can make intelligent decisions about verticalization techniques, choosing whether to follow action with dynamic framing or use blur effects for letterboxing. **Review Agent:** Before content is distributed, a review agent ensures quality and adherence to standards. This is particularly critical for Bloomberg where trust and accuracy are paramount. The review agent checks that clips aren't taken out of context, verifies that content guardrails are met, and assigns confidence scores. It can flag items for human review based on configurable thresholds, implementing human-in-the-loop workflows when needed. **Distribution Agent:** The final agent handles publishing to target platforms. While simple distribution could use conditional logic rather than AI, the agent can make intelligent decisions about which platforms are appropriate for specific content types, provide feedback to human operators about distribution recommendations, and handle the technical details of formatting and uploading. The agent architecture leverages AWS Bedrock for the underlying LLM capabilities with orchestration likely handled through AWS Step Functions or similar services. A critical advantage emphasized by the presenters is complete lineage tracking - because all agent interactions happen within AWS infrastructure, every decision can be logged including which angles were selected and why, what scoring led to specific choices, which prompts were used, and which model versions were invoked. This traceability is essential for continuous improvement and understanding model behavior in production. ## Production Operations and Observability Throughout the presentation, Bloomberg emphasized operational considerations for running AI at scale in production environments. The versioning and tiering system provides the foundation for safe experimentation and deployment. By running multiple versions simultaneously and marking outputs with quality levels, teams can compare performance metrics including accuracy, latency, and cost across different configurations. The platform includes comprehensive observability across all workflow stages. Metadata extraction pipelines track processing time, model inference costs, and quality metrics for each analysis type. Search operations log query patterns, result relevance scores, and user feedback (thumbs up/down) that can be used to refine intent analysis prompts. Agent workflows capture complete execution traces showing decision points, reasoning, and outputs at each stage. Cost management is an explicit consideration in the architecture. Different embedding models and analysis techniques have vastly different computational costs, and the ability to version and swap these components allows optimization based on cost-benefit tradeoffs. The federated search layer can route queries to less expensive data stores when appropriate, while reserving more expensive multimodal embeddings for queries where they provide clear value. The platform is designed following AWS Well-Architected Framework principles including high availability, resilience, security, and scalability. The hybrid cloud architecture allows Bloomberg to leverage their existing on-premises infrastructure while using AWS for elastic compute, storage, and AI services. This hybrid approach addresses both operational requirements and data governance concerns common in media organizations. ## Scaling Considerations and Current Status Bloomberg processes approximately 3,000 hours of new video content daily that must be analyzed and made searchable, in addition to the 13 petabytes of historical content they aim to analyze comprehensively. This massive scale drives many architectural decisions. The loosely coupled, service-oriented design allows independent scaling of different components - ingestion can scale separately from analysis, which scales separately from search. The use of purpose-built data stores optimized for different access patterns (OpenSearch for keyword and hybrid search, Neptune Analytics for graph queries, vector databases for embedding search) allows each to scale according to its specific workload characteristics. The agent-based content creation workflows can execute many tasks in parallel, dramatically reducing end-to-end processing time compared to sequential workflows. Louis Barbeau noted that at the time of the presentation, the platform represents Bloomberg's vision and ongoing work rather than a fully deployed production system. He characterized it as a journey where AWS worked with Bloomberg to shape AWS technology to align with Bloomberg's vision, acknowledging that "what you're trying to achieve is something that we cannot do today, but if we work together... we will get somewhere." This honest assessment reflects the reality of implementing cutting-edge LLMOps at scale - it requires partnership between technology providers and users to push boundaries. ## Technical Implementation Details While the presentation avoided deep code dives, several implementation details emerged. The intent analysis for federated search uses a carefully crafted prompt that describes available data stores and asks the LLM to analyze the search query and return weights for each store along with reasoning. This prompt is versionable and iteratively improved based on search performance. The example showed a JSON structure returned by the LLM with weights for visual, people, transcript, and audio databases. For video analysis using VLMs, the examples used Amazon Bedrock's Nova model accessed through the AWS Python SDK (boto3). The simplicity of the integration - a few lines of code to send video and a prompt to the model - demonstrates how modern AI services abstract complexity. The presenters noted that generating similar integration code for new models can now be automated using LLMs themselves (they mentioned using Claude Opus to generate integration code quickly). The knowledge graph implementation uses Amazon Neptune Analytics, a serverless graph database optimized for analytical queries over large graphs. The schema includes nodes for assets, people, stories, places, events, and topics with relationships capturing temporal, identity, and domain connections. Graph queries can traverse these relationships to answer complex questions about content evolution, returning speaker patterns, topic discussions, and cross-asset connections. Content assembly likely leverages AWS Step Functions for workflow orchestration, with Lambda functions handling individual agent logic. The platform uses Amazon S3 as the central storage layer for both source media and generated outputs, with S3 events triggering analysis workflows. The hybrid architecture integrates on-premises systems through network connectivity to AWS, allowing workflows to span both environments. ## Lessons and Trade-offs Several important lessons and trade-offs emerge from Bloomberg's experience. The emphasis on loose coupling and "disposable AI" acknowledges that in the rapidly evolving AI landscape, tight integration with specific models or vendors creates technical debt that becomes expensive to address. The trade-off is increased architectural complexity - managing multiple versions, federated data stores, and abstraction layers requires sophisticated engineering. Bloomberg judged this complexity worthwhile given the flexibility it provides. The multi-faceted analysis approach using task-specific models, VLMs, and embeddings simultaneously increases processing costs and latency compared to using a single method. Bloomberg's assessment is that the richer understanding and improved search relevance justify these costs, but this represents a conscious trade-off. Organizations with tighter cost constraints might choose more selective analysis. The human-in-the-loop option for content review reflects Bloomberg's priority on maintaining trust and accuracy over full automation. While automated content creation dramatically reduces time to market, the ability to have humans review agent-generated content before publication ensures editorial standards are met. This represents a pragmatic balance between automation and quality control. The case study illustrates that implementing production LLMOps for complex media workflows requires thinking beyond individual models or services to create cohesive platforms that handle the full lifecycle from ingestion through analysis, search, and content creation. Bloomberg's approach prioritizes operational flexibility, comprehensive observability, and the ability to evolve continuously as AI capabilities advance, even at the cost of architectural complexity. Their "disposable AI strategy" provides a framework for organizations to think about building AI systems that can adapt and improve over time rather than becoming rigid and difficult to change.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.