Kantar Worldpanel: Fine-tuning LLMs for Market Research Product Description Matching

LLMOps Database

Consulting

Kantar Worldpanel

Company

Kantar Worldpanel

Title

Fine-tuning LLMs for Market Research Product Description Matching

Industry

Consulting

Link

https://www.databricks.com/customers/kantar-genai

Year

2024

Summary (short)

Kantar Worldpanel, a market research company, needed to modernize their product description matching system to better link paper receipt descriptions with product barcode names. They leveraged Databricks Mosaic AI to experiment with various LLMs (including Llama, Mistral, and GPT models) to generate high-quality training data, achieving 94% accuracy in matching product descriptions. This automated approach generated 120,000 training pairs in just hours, allowing them to fine-tune smaller models for production use while freeing up human resources for more complex tasks.

Tags

data_analysis

structured_output

legacy_system_integration

## Overview Kantar Worldpanel is a prominent international market research company that collects and analyzes consumer data, primarily in the fast-moving consumer goods (FMCG) sector. Their core business involves providing actionable insights to manufacturers and retailers, enabling them to understand consumer behaviors and make informed business decisions. The company essentially sells insights derived from data, making data quality and processing efficiency critical to their value proposition. The case study focuses on how Kantar Worldpanel is leveraging generative AI and LLM technologies to improve a key upstream data processing task: linking descriptions from paper receipts to product barcode names. This matching process is fundamental to their business as it allows them to identify what products were purchased by which buyers, which is subsequently transformed into insights sold to clients. ## The Problem and Legacy System Challenges Kantar Worldpanel faced several significant challenges with their existing systems that motivated their exploration of modern AI-driven solutions. Their legacy systems suffered from inflexibility and poor scalability compared to modern cloud-native platforms. These older systems were resource-intensive to maintain and manage, placing a significant burden on engineering teams. Perhaps most critically, the systems required outdated programming skillsets, which not only limited accessibility within the organization but also created talent acquisition and retention challenges in a competitive market. The urgency of adopting new technologies was driven by the rapid advancement of AI capabilities in the market research space. The data science team recognized that GenAI tools could deliver substantially better results than traditional approaches, and they wanted the ability to rapidly experiment with different proofs of concept to identify opportunities for business improvement. ## The LLMOps Solution Architecture Kantar Worldpanel adopted the Databricks Data Intelligence Platform as their foundation for advanced AI and machine learning initiatives. This platform provided the scalable, flexible, and integrated environment needed to support their GenAI experimentation and eventual production deployment. ### Experiment Tracking and Model Management with MLflow The data science team leverages MLflow, the open source platform developed by Databricks, to manage the full machine learning lifecycle. This component enables them to track experiments, reproduce runs, and deploy models more efficiently. According to the team, Databricks simplifies cluster management and results storage through MLflow integration. The platform makes it straightforward to check results, download models, and replicate code. The higher-level abstractions provided by Databricks reduce operational complexity for the data scientists. ### Model Evaluation and Selection A central aspect of their LLMOps approach involved systematic evaluation of multiple LLM options for their receipt-to-barcode matching task. The team experimented with several models including Llama, Mistral, GPT-4, and GPT-3.5, all within the Databricks Platform. They leveraged the Databricks Marketplace to easily download and test different models, creating a streamlined evaluation workflow. The evaluation process was designed to be efficient and conclusive. Models could be downloaded from the Databricks Marketplace and experiments run quickly. The labeling system allowed them to easily understand which model performed better by indicating whether results were correct or wrong at the end of each experiment. Ultimately, GPT-4 emerged as the top performer with an accuracy of 94% for their specific use case. ### Training Data Generation at Scale One of the most significant LLMOps achievements highlighted in the case study is the use of LLMs for training data generation. Rather than relying heavily on manual human labeling, Kantar Worldpanel used GPT-4 to automatically generate a training dataset of approximately 120,000 pairs of receipt descriptions and barcode names with 94% accuracy. Remarkably, this was accomplished in just a couple of hours, representing a dramatic reduction in time and human resources compared to traditional manual coding approaches. This approach to synthetic training data generation exemplifies a modern LLMOps pattern where larger, more capable models are used to create high-quality training data that can then be used to fine-tune smaller, more efficient models for production deployment. ### Fine-Tuning Strategy for Production The case study outlines a thoughtful approach to production deployment that balances quality with cost and performance considerations. Once the team identified GPT-4 as the best model for generating training data, they planned to use this data to fine-tune a smaller 8-billion parameter model that can be served in their production pipeline. This strategy reflects several important LLMOps best practices. Smaller models are more cost-effective to run at scale in production environments. They also offer better performance characteristics in terms of latency and throughput. Using a larger model to generate training data for a smaller specialized model is an increasingly common pattern in production LLM systems. ### Vector Search Exploration Kantar Worldpanel is also exploring Mosaic AI Vector Search capabilities to perform detailed comparisons and linkages between receipt and reference product descriptions. While this appears to still be in the exploration phase rather than production, it suggests the team is considering embedding-based approaches that could complement or enhance their fine-tuned model approach. Vector search could improve the accuracy and efficiency of their data processing and enable more comprehensive insights delivery to manufacturing and retail clients. ### Governance and Data Sharing with Unity Catalog Unity Catalog, Databricks' unified governance solution, provides the security and collaboration framework for Kantar Worldpanel's data operations. Given that the company works across very different environments and teams, the ability to share data securely is essential. Unity Catalog enables this cross-team data sharing while ensuring data protection, which is particularly important for a company handling consumer purchase data. ## Resource Optimization and Workflow Improvements The adoption of the Databricks platform has delivered tangible operational benefits for Kantar Worldpanel beyond just the AI capabilities. The automated training data generation frees up manual coding teams to focus on more discrepant results rather than generating large volumes of training data. This represents a significant reallocation of human resources toward higher-value activities. Engineering resources can now focus more on core development tasks, including modernizing other model serving approaches within their current data processing platform. The self-service nature of the platform means data scientists no longer need to rely heavily on engineers to set up clusters with appropriate parameters or configure services. With minimal code, they can download models, experiment with them, and access appropriately sized compute clusters, all in a single platform. ## Critical Assessment It is worth noting that this case study originates from Databricks' customer stories, which inherently presents a vendor-favorable perspective. The 94% accuracy figure is notable but the case study does not provide details on how accuracy was measured, what the baseline accuracy was with previous systems, or what the remaining 6% error rate means for downstream business impact. The case study also describes this work primarily as a proof of concept rather than a fully deployed production system, so the claimed benefits are somewhat forward-looking rather than fully realized. The fine-tuned 8B parameter model that would be served in production is mentioned as a future step rather than something already operational. Additionally, while the case study mentions that "smaller models are more cost-effective but more performant," the actual cost comparisons between GPT-4 usage for training data generation versus the operational costs of running and maintaining a fine-tuned model are not provided. Real-world LLMOps decisions require careful analysis of these trade-offs. ## Future Direction Kantar Worldpanel views the product descriptions proof of concept as just one of many use cases they plan to leverage with GenAI. They emphasize the value of having a platform partner like Databricks that enables experimentation and putting models into production in a cost-effective way. This suggests an ongoing investment in expanding their GenAI capabilities across their business operations, with the receipt-matching use case serving as a foundation for broader AI-driven automation of their data processing and insights generation workflows.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source