ZenML

Automating Video Ad Classification with GenAI

MediaRadar | Vivvix 2024
View original source

MediaRadar | Vivvix faced challenges with manual video ad classification and fragmented workflows that couldn't keep up with growing ad volumes. They implemented a solution using Databricks Mosaic AI and Apache Spark Structured Streaming to automate ad classification, combining GenAI models with their own classification systems. This transformation enabled them to process 2,000 ads per hour (up from 800), reduced experimentation time from 2 days to 4 hours, and significantly improved the accuracy of insights delivered to customers.

Industry

Media & Entertainment

Technologies

Overview

MediaRadar | Vivvix is an advertising intelligence company that helps brand marketers and advertising agencies understand their competitive landscape and optimize media spend decisions. Their core mission involves identifying “what, where and when” for their clients—essentially classifying advertisements to determine brand, products, and notable details like featured celebrities. This case study documents their transition from a manual, fragmented data processing approach to an automated GenAI-powered classification system using Databricks’ platform.

The case study is presented by Databricks as a customer success story, which means it naturally emphasizes positive outcomes. However, the technical details provided offer genuine insight into how LLMs can be operationalized for large-scale classification tasks in the advertising domain.

The Problem: Scale and Manual Limitations

MediaRadar | Vivvix faced what they describe as “an extreme classification problem”—classifying ads across over 6 million unique products. Their previous infrastructure relied on Amazon Simple Queue Service (SQS) with significant limitations: manual polling for data and a constraint of only 10 messages at a time. This setup made it difficult to meet service level agreements (SLAs).

The company’s Senior ML Engineer, Dong-Hwi Kim, noted that existing ML models were inadequate due to the sheer scale and diversity of products. Even their fine-tuned in-house ML model couldn’t handle the millions of products because they lacked sufficient training data to support such a broad classification scheme. The pre-existing process relied heavily on manual labor—hundreds of operators watching ads and recording detailed information about their elements. This approach was not sustainable given that overall ad spend by companies had almost doubled in recent years, making it unfeasible to proportionally increase human resources.

Additionally, their infrastructure suffered from fragmentation. Principal Software Engineer Thierry Steenberghs described a “nightmare” scenario with multiple pods running different components, requiring separate Azure logins to monitor operations. The workflow involved building models, exporting them, and then importing them elsewhere—a process that wasted significant engineering time.

The Solution Architecture

MediaRadar | Vivvix implemented a multi-layered approach using Databricks’ platform, combining several technologies to automate their ad classification pipeline:

Data Ingestion and Streaming

The adoption of Apache Spark Structured Streaming enabled continuous, real-time data ingestion without manual intervention. This replaced the previous SQS-based approach that required manual polling. The streaming architecture allowed them to process video ad data as it arrived, eliminating concerns about meeting SLAs and enabling efficient processing of thousands of ads per hour.

Preprocessing Pipeline

Before classification, the team developed robust preprocessing pipelines with several components:

These preprocessing steps were critical for preparing data for the GenAI classification stage and maintaining a clean, deduplicated database.

Dual-Layer GenAI Classification Approach

The core innovation was a dual-layer classification strategy that combined GenAI capabilities with MediaRadar | Vivvix’s proprietary classification models:

This approach leverages the generalization capabilities of large language models while grounding predictions against domain-specific models trained on their proprietary taxonomy, resulting in higher accuracy for identifying correct products.

Infrastructure and Scaling with Ray on Databricks

The team utilized Ray clusters on Databricks to optimize video processing and scale classification across millions of categories. Ray provides distributed computing capabilities that allow horizontal scaling of the video processing and classification workloads. This was essential for handling the volume of ads that needed processing.

Model Serving and Cost Management

Initial experimentation was conducted using the Mosaic AI Model Serving environment, which facilitated rapid prototyping and testing of various machine learning models. This environment provided a sandbox for quickly iterating on different approaches.

A notable production consideration was cost management. The team chose OpenAI’s GPT-3.5 model rather than more expensive alternatives like GPT-4, explicitly balancing performance against expense. This decision allowed them to process thousands of creative assets daily without incurring prohibitive costs—a practical consideration often overlooked in case studies but critical for sustainable LLM operations at scale.

Unified Platform Benefits

One of the significant operational improvements came from consolidating their workflow onto a single platform. Steenberghs emphasized that moving everything into Databricks eliminated data silos. The team can now:

This unified view replaced the previous fragmented monitoring across multiple systems and Azure instances.

Results and Performance Improvements

The case study reports several quantitative improvements:

These improvements enabled MediaRadar | Vivvix to keep pace with the rapidly growing volume of advertisements in the market without proportionally increasing human resources.

Future Directions

The company expressed plans to implement Databricks Unity Catalog for enhanced data management and security. This is particularly important for an advertising intelligence company handling data from multiple brands, as it enables fine-grained access control—allowing open access while restricting who can see specific data. This addresses a current challenge where enforcing security is difficult in their existing setup.

Critical Assessment

While this case study presents compelling results, a few considerations are worth noting:

Overall, this case study demonstrates a practical approach to scaling classification tasks using LLMs, with attention to production concerns like cost management, preprocessing quality, and monitoring. The hybrid approach of combining GenAI with domain-specific models is a pattern worth noting for organizations facing similar extreme classification challenges.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building a Hybrid Cloud AI Infrastructure for Large-Scale ML Inference

Roblox 2024

Roblox underwent a three-phase transformation of their AI infrastructure to support rapidly growing ML inference needs across 250+ production models. They built a comprehensive ML platform using Kubeflow, implemented a custom feature store, and developed an ML gateway with vLLM for efficient large language model operations. The system now processes 1.5 billion tokens weekly for their AI Assistant, handles 1 billion daily personalization requests, and manages tens of thousands of CPUs and over a thousand GPUs across hybrid cloud infrastructure.

content_moderation translation speech_recognition +25

Foundation Model for Unified Personalization at Scale

Netflix 2025

Netflix developed a unified foundation model based on transformer architecture to consolidate their diverse recommendation systems, which previously consisted of many specialized models for different content types, pages, and use cases. The foundation model uses autoregressive transformers to learn user representations from interaction sequences, incorporating multi-token prediction, multi-layer representation, and long context windows. By scaling from millions to billions of parameters over 2.5 years, they demonstrated that scaling laws apply to recommendation systems, achieving notable performance improvements while creating high leverage across downstream applications through centralized learning and easier fine-tuning for new use cases.

content_moderation classification summarization +37