Yahoo: Scaling Email Content Extraction Using LLMs in Production

LLMOps Database

Tech

Yahoo

Company

Yahoo

Title

Scaling Email Content Extraction Using LLMs in Production

Industry

Tech

Link

https://www.youtube.com/watch?v=hVrv0sv93ls

Year

2023

Summary (short)

Yahoo Mail faced challenges with their existing ML-based email content extraction system, hitting a coverage ceiling of 80% for major senders while struggling with long-tail senders and slow time-to-market for model updates. They implemented a new solution using Google Cloud's Vertex AI and LLMs, achieving 94% coverage for standard domains and 99% for tail domains, with 51% increase in extraction richness and 16% reduction in tracking API errors. The implementation required careful consideration of hybrid infrastructure, cost management, and privacy compliance while processing billions of daily messages.

Tags

# Yahoo Mail LLM-Based Content Extraction System ## Background and Problem Statement Yahoo Mail, a consumer email platform processing billions of messages daily, needed to improve their email content extraction capabilities for packages and receipts tracking. Their existing system had several limitations: - Coverage ceiling with traditional ML approach - Slow time-to-market for model updates - User trust issues ## Technical Solution ### Architecture and Implementation - Hybrid solution combining on-premises and cloud infrastructure - Model Selection - Infrastructure Setup - Testing and Validation ### Development Process - Initial prototype during internal Yahoo hackathon - Collaboration with Google Cloud team - Phased rollout approach ### MLOps Considerations - Data Management - Cost Optimization - Monitoring and Evaluation ## Results and Impact ### Performance Improvements - Coverage Expansion - Quality Metrics - Scale Achievement ### Operational Benefits - Faster processing time - Improved accuracy - Better handling of long-tail cases - Enhanced user experience ## Future Plans and Improvements - Infrastructure Evolution - Technical Enhancements - Process Optimization ## Key Learnings - Hybrid Infrastructure Challenges - Implementation Insights - Production Considerations ## Technical Implementation Details - Development Environment - Security Measures - Monitoring Setup The project demonstrates successful implementation of LLMs in a high-scale production environment, balancing performance, cost, and privacy requirements while achieving significant improvements in extraction capabilities.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source