Google: Google Photos Magic Editor: Transitioning from On-Device ML to Cloud-Based Generative AI for Image Editing

LLMOps Database

Tech

Google

Company

Google

Title

Google Photos Magic Editor: Transitioning from On-Device ML to Cloud-Based Generative AI for Image Editing

Industry

Tech

Link

https://www.youtube.com/watch?v=C13jiFWNuo8

Year

2025

Summary (short)

Google Photos evolved from using on-device machine learning models for basic image editing features like background blur and object removal to implementing cloud-based generative AI for their Magic Editor feature. The team transitioned from small, specialized models (10MB) running locally on devices to large-scale generative models hosted in the cloud to enable more sophisticated image editing capabilities like scene reimagination, object relocation, and advanced inpainting. This shift required significant changes in infrastructure, capacity planning, evaluation methodologies, and user experience design while maintaining focus on grounded, memory-preserving edits rather than fantastical image generation.

## Overview This case study presents Google Photos' journey from traditional on-device machine learning to cloud-based generative AI through the perspective of Kelvin, an engineer on Google's Photos editing team. Google Photos serves 1.5 billion monthly active users and processes hundreds of millions of edits per month, making this a large-scale production LLMOps implementation. The case study covers the evolution from 2018's computational photography approach using small, specialized models to 2022-2023's Magic Editor feature leveraging state-of-the-art generative models in the cloud. The transition represents a fundamental shift in LLMOps architecture, moving from edge computing with deterministic, specialized models to cloud-based generative AI systems that require entirely different approaches to deployment, evaluation, and user experience design. Google Photos' approach demonstrates how established ML products can evolve to incorporate generative AI while maintaining production reliability and user trust. ## Technical Evolution and Architecture Google Photos initially built their editing capabilities around on-device inference using specialized convolutional neural networks. Their first major feature, post-capture segmentation for background blur effects, used a 10MB U-Net model running entirely on the user's device via TensorFlow Lite (now LiteRT). This architecture provided several advantages: zero network latency, no server costs, consistent performance regardless of user base size, and complete privacy since all processing happened locally. The on-device approach utilized a shared C++ library for model inference across Android, iOS, and web clients, with tight integration to Google's Pixel hardware including EdgeTPU acceleration. The team maintained close collaboration with internal research teams, allowing them to iterate on custom models built specifically for their use cases rather than relying on general-purpose models. However, as the team expanded to more complex features like Magic Eraser in 2021, they began orchestrating multiple models in sequence: distractor detection, segmentation, inpainting, and custom GL rendering for user interface. This represented an early form of model orchestration, though still constrained to on-device capabilities. Models grew to hundreds of megabytes, pushing the limits of mobile deployment. The shift to generative AI in 2022-2023 for Magic Editor required a complete architectural overhaul to cloud-based inference. This transition brought new challenges the team had never faced: server capacity planning, network latency management, and distributed system reliability. The speaker notes this was particularly jarring coming from an on-device background where "the user brings the compute" and scaling was essentially free. ## LLMOps Challenges and Solutions ### Evaluation and Testing One of the most significant LLMOps challenges discussed is evaluation methodology. The team emphasizes that benchmarks for ML models are equivalent to unit tests for traditional software - they prevent regressions and validate improvements. However, creating meaningful benchmarks for generative image editing proved complex because the problem space is inherently subjective and creative. The transition to cloud-based large models broke their existing testing infrastructure. Models became too large to include in automated testing suites, forcing the team to develop new approaches for regression testing. The speaker highlights this as an ongoing challenge where traditional software engineering practices don't translate directly to LLMOps. The team developed evaluation strategies focused on specific use cases rather than general capabilities. Instead of trying to benchmark "generative image editing" broadly, they created targeted evaluations for specific features like object relocation, scene reimagination, and advanced erasing with reflection handling. ### Model Orchestration and Prompt Engineering Google Photos implemented a sophisticated approach to reduce the ambiguity inherent in generative AI. Rather than exposing users to open-ended prompting, they built guided experiences where users can select and visualize objects in their images, allowing the system to generate more specific and contextual prompts. This approach treats the AI as a "co-editor" or "co-pilot" rather than requiring users to become prompt engineering experts. The system constrains the problem space by focusing on three core use cases identified through user research: object relocation within images, scene reimagination (particularly backgrounds), and advanced object removal with intelligent inpainting. By training specialized models for these specific scenarios, they achieve higher accuracy and reliability compared to general-purpose approaches. ### Infrastructure and Capacity Planning The transition to cloud-based inference introduced entirely new operational challenges. The team had to learn server capacity planning for the first time, particularly complex given their use of accelerated compute (TPUs/GPUs). This represents a common LLMOps challenge where teams must balance model performance with operational costs and latency requirements. Latency became a critical concern, with the team now having to account for network quality, data center load, geographic distribution, and round-trip times. This is particularly challenging for an interactive editing experience where users expect near-real-time feedback. ### Trust and Safety The case study acknowledges trust and safety as an ongoing challenge in generative AI systems. The team takes a pragmatic approach, recognizing that perfect safety is impossible and focusing on preventing the most harmful cases while managing expectations across users, media, and internal stakeholders. They emphasize the importance of precision and recall metrics rather than pursuing perfect accuracy. The ambiguity of natural language presents ongoing challenges - the speaker gives an example of "the view from the mountain was sick" where the model must correctly interpret "sick" as positive rather than relating to illness. This highlights the complexity of deploying language-understanding systems in production. ## Production Deployment Strategies Google Photos' LLMOps strategy emphasizes reducing randomness and bringing deterministic behavior to inherently probabilistic systems. Their approach involves starting with large, capable models for initial functionality, then optimizing through model distillation, finding more efficient alternatives, or even replacing with traditional engineering approaches where appropriate. The team leverages "hallucination as a feature" in creative contexts by providing multiple generated options to users, turning the non-deterministic nature of generative models into a creative tool. This approach works because image editing is subjective - there's no single "correct" answer, allowing users to choose from multiple generated variations. Their migration strategy involves iterative improvement: build reliable evaluation systems, achieve production value with large models, then optimize for efficiency and speed. The focus on faster iteration enables more experimental cycles and better product outcomes. ## Future Architecture Considerations The case study concludes with speculation about future directions, particularly the potential return to on-device inference as smaller models (like Gemini Nano) achieve capabilities comparable to previous cloud-based versions. This highlights a key LLMOps consideration: architectural decisions must account for rapidly evolving model capabilities and efficiency improvements. The team announced a complete rebuild of their editor to be "AI-first," suggesting a fundamental shift in product architecture where AI capabilities are central rather than additive. This represents a mature approach to LLMOps where generative AI becomes the primary interface rather than a specialized feature. ## Key LLMOps Lessons The case study provides several important insights for LLMOps practitioners. First, the importance of constraining problem scope - while generative models can theoretically do anything, production systems require clear boundaries and specific use cases to achieve reliability. Second, evaluation becomes critical and complex, requiring new approaches beyond traditional ML benchmarking. Third, the transition from edge to cloud inference involves fundamental trade-offs in latency, cost, privacy, and operational complexity. Teams must carefully consider these factors based on their specific use cases and user requirements. Fourth, user experience design becomes crucial in generative AI systems - how users interact with the AI (through guided selection vs. open prompting) significantly impacts system reliability and user satisfaction. Finally, the case demonstrates that successful LLMOps often involves hybrid approaches combining generative AI with traditional engineering and specialized models, rather than replacing everything with large language models. The most effective systems use generative AI where it provides unique value while maintaining traditional approaches where they're more reliable or efficient.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source