Rovio: Accelerating Game Asset Creation with Fine-Tuned Diffusion Models

Overview

Rovio, the Finnish gaming company famous for Angry Birds, embarked on a comprehensive generative AI journey to address a critical production bottleneck: the inability to create game art assets at the pace required by their live-service gaming model. With artists comprising nearly one-third of the workforce and constantly under pressure to deliver new seasonal content (Halloween themes, Valentine’s Day variations, Christmas assets) and support for new game features and special events, the company needed a scalable solution that could preserve their brand’s unique visual identity while accelerating production.

The case study is particularly notable for its realistic portrayal of the challenges involved in deploying generative AI in production environments where quality standards are non-negotiable. Unlike many case studies that present smooth adoption stories, Rovio’s journey included multiple failures, continuous iteration, and a deep focus on change management alongside technical implementation. The solution, called “Beacon Picasso,” represents a mature LLMOps implementation spanning model training, fine-tuning, deployment, and user-facing tooling.

Initial Exploration and Early Failures (2022)

Rovio’s ML team began experimenting with diffusion models in 2022 when image generation capabilities first became widely available. From the outset, they established important ground rules that would guide their entire approach: use only proprietary material, keep data safe within their own infrastructure (either locally or on their cloud), and respect intellectual property rights even in prompts. These principles addressed valid concerns about data privacy and IP that were particularly sensitive for artists.

The team’s first proof of concept aimed to train a model to generate Red, the main Angry Birds character, in the style of their movies. The initial results were failures by any standard—the generated images were distorted, barely recognizable, and in some cases “scary.” The team persisted through multiple iterations, and by December 2022 had achieved results they considered good for the technology at that time. However, when shown to professional artists, these outputs were deemed insufficient. The artists identified numerous defects in gestures, proportions, and style consistency that the ML engineers couldn’t even perceive.

A key technical challenge emerged: Red exists in many different visual styles across Rovio’s game portfolio. The classic version looks fundamentally different from the movie version, yet both are instantly recognizable to fans. Games like Angry Birds Dream Blast feature cute baby versions with hands and legs, while Angry Birds Friends uses the classic legless design. This multiplicity of valid representations made training consistent models extremely difficult. The same challenge extended to all main brand characters, with Rovio maintaining hundreds of pages of style guides explaining precise specifications for eyes, eyebrows, cheeks, wings, feet, and facial expressions.

Key Technical Insights and Breakthrough Realizations

Through their early failures, the team gained several crucial insights that would shape their production approach:

Secondary Characters and Non-Brand-Essential Assets: While generating main brand characters to the required quality proved extremely difficult, the team discovered that secondary characters (like crocodiles and sharks in their game styles) could be generated successfully because they lacked the extensive style documentation and fan expectations of main characters. More importantly, they realized that backgrounds and environmental assets offered the ideal use case—these required the distinctive Rovio visual style but allowed flexibility in composition. Unlike a character that needs precisely positioned eyes and correctly proportioned features, a background can have an extra tree or different rock placement without breaking brand consistency.

Control vs. Flexibility Trade-offs: The team learned that AI works better for tasks requiring less exact control. When artists first tried the tools expecting to generate specific brand characters, they became frustrated and quit after one attempt. The technology and workflows needed to be matched to use cases where some creative surprise and variation was acceptable or even desirable, such as generating new locations and environments.

Volume and Randomness Management: Production use required generating hundreds of images to find a few good ones. Artists needed to develop a new mindset—being open to surprises rather than expecting exact realization of a mental image. The team discovered techniques like locking the random seed while modifying prompts, allowing controlled variation of specific image elements while maintaining overall composition.

Data Quality Over Quantity: A major breakthrough came in understanding that fine-tuning required high-quality curated datasets but not large volumes—just 20-30 well-crafted images could be sufficient. More importantly, having artists (rather than ML engineers) assemble and caption these datasets proved essential. Artists brought richer vocabulary for describing visual elements, better understanding of the target style, and their direct involvement built crucial trust in the system. When artists could see how changing images in the training dataset impacted outputs, they gained confidence in the process and ownership of the results.

Artist-in-the-Loop Requirements: The most important learning was that successful production deployment required artists throughout the pipeline—to craft prompts using appropriate visual vocabulary, to select promising generations, and critically, to review and edit outputs to meet quality standards. The technology accelerated certain phases of work but didn’t eliminate the need for skilled creative judgment.

Technical Architecture and Training Pipeline

Rovio’s production system centers on fine-tuning base diffusion models with proprietary data to capture their distinctive visual styles. The technical approach follows standard diffusion model architecture but with careful attention to operational concerns.

Base Model Selection and Fine-Tuning Process: The team experimented with multiple base models, recognizing that no single model works best for all use cases. Base diffusion models are trained on billions of internet images with text descriptions. These models work by adding noise to images to create latent representations, then learning to reverse that noise guided by text embeddings. To adapt these generic models to Rovio’s styles, they fine-tune using curated datasets that capture specific game aesthetics.

Each fine-tuned model uses a unique “trigger word”—an intentionally unusual term that won’t conflict with the base model’s existing vocabulary. During fine-tuning, images are progressively transformed from realistic base model outputs to match the target Rovio style. This process is iterative and experimental, requiring multiple training runs with different hyperparameters (learning rate, training steps, etc.) and dataset variations (different images, caption styles).

Training Infrastructure on AWS SageMaker: All fine-tuning happens in SageMaker training jobs, allowing the team to run dozens of experiments in parallel. Training images are stored in S3, and each training job produces model artifacts (safetensor files) at various checkpoints. A key operational challenge is determining which checkpoint produces the best results, as later training steps can lead to overfitting. The team addresses this by generating visualization videos showing how outputs evolve through training, which artists review to select optimal checkpoints.

The ability to parallelize training was crucial for experimentation velocity. Different teams could explore various datasets, captioning approaches, and hyperparameters simultaneously, with results available within hours rather than days or weeks.

Inference Infrastructure and Performance: For inference (actual image generation), Rovio initially deployed on EC2 G6 instances with GPUs, then migrated to G6e instances when they became available, achieving better performance than expensive desktop workstations artists had purchased. The G6e instances enabled generation in seconds rather than minutes, which proved essential for artist adoption—long wait times led to frustration and abandonment.

The infrastructure uses auto-scaling groups across multiple availability zones, important both for fault tolerance and for dealing with GPU availability constraints. Instances are automatically shut down outside working hours and on weekends to control costs, as GPU instances are expensive.

Production Tools: A Multi-Tier Approach

Rather than building a single interface, Rovio developed three distinct tools tailored to different user needs and technical comfort levels—a crucial product decision that acknowledged the diversity of workflows and preferences within their organization.

Beacon Picasso Slackbot (Democratization Tool): The first tool built was a Slack bot accessible to all Rovio employees. The goal was democratization—ensuring everyone could generate at least one AI image and learn together. The Slackbot addressed numerous adoption barriers and concerns: data privacy (solved by AWS infrastructure keeping data in their VPC), harmful content generation, environmental impact, “slop” content flooding, and particularly sensitive IP rights issues (artists were justifiably concerned after seeing their work used without consent to train public models).

The Slackbot’s architecture is straightforward: when a user pings the bot with a prompt, a Lambda function responds with an acknowledgment, then triggers a SageMaker inference request. Generated images are stored in S3, another Lambda sends results back to Slack via SNS messaging, and a third Lambda stores metadata (generation time, daily usage statistics) in RDS for analytics.

Beyond basic functionality, the Slackbot served critical organizational purposes. Public generation in Slack channels revealed biases—one prompt for “important meeting at Rovio” generated only white males, prompting discussion about diversity in AI outputs. The bot opened conversations about valid concerns rather than trying to convince people AI was unconditionally good. This transparent, organization-wide experimentation built awareness and identified issues collaboratively.

Beacon Picasso Pro Studio Cloud (Advanced Users): For production asset creation, artists needed much more sophisticated capabilities than simple text-to-image generation. Rovio leveraged open-source tools (the specific tools aren’t named but likely include ComfyUI or similar workflow-based systems) that provide extensive control through complex workflows—sequences of steps that transform ideas into final images with adjustable parameters like number of diffusion steps, model strength (how much to follow the fine-tuned style), guidance scale (how closely to follow prompts), and multi-model blending.

The initial approach had artists running these tools locally, but this proved unscalable—requiring ML engineers to configure Python environments and dependencies on each artist’s machine, with updates necessitating repeated setup sessions. Some excited artists even ordered expensive GPU desktops that proved “as loud as washing machines.”

The cloud migration solved these problems. Artists access instances through SSH tunnels, later improved with friendly DNS names via Route 53 resolvers, allowing simple browser bookmarks. The infrastructure uses auto-scaling groups across availability zones for GPU availability and fault tolerance, with automatic shutdown during off-hours for cost control.

A key operational challenge was enabling AI engineers to push updates without bottlenecking on infrastructure specialists. Rovio built a CI/CD pipeline where AI engineers push code to Git repositories, triggering AWS CodePipeline to provision a temporary EC2 instance that configures another instance by downloading base models and fine-tuned artifacts from S3, running setup scripts, and creating an AMI. This automation allows AI engineers to deploy new model versions and tool updates without needing Terraform expertise.

Beacon Picasso Studio (Middle Ground): Despite having simple and advanced tools, Rovio identified a gap. As their library of fine-tuned models grew, artists struggled to find appropriate models for their use cases. They’d consult internal documentation to identify the right model and workflow, but the complexity remained high. The team wanted to hide this complexity while helping artists with prompting.

This is where LLMs entered the picture beyond image generation. Using Amazon Bedrock with Claude, two AI engineering interns and a UX design intern rapidly prototyped a new interface. Bedrock’s inference profiles allowed easy model switching via the Converse API and straightforward cost tracking across different use cases.

The resulting interface organizes models by game (e.g., Dream Blast artists see only Dream Blast-relevant models and workflows) and use case (backgrounds, characters, etc.). Each use case includes example prompts and generated images as starting points. Critically, the interface includes “Picasso Assistant,” a conversational AI powered by Bedrock that helps artists modify prompts without crafting complex text from scratch. An artist can request “make this a winter forest with folks around,” and the assistant reformulates the prompt appropriately.

This middle-tier tool democratized access to the sophisticated capabilities of Pro Studio Cloud. Artists who previously found the complexity overwhelming could now generate production assets, only escalating to Pro Studio Cloud for truly complicated use cases requiring maximum control. The rapid development enabled by Claude Code (Amazon’s AI coding assistant) and Bedrock demonstrates how LLMs can accelerate LLMOps tooling development itself.

Production Workflow and Process Changes

The actual production pipeline places artists firmly in control, with AI as an accelerant rather than a replacement. For a typical asset like a season pass background:

Ideation and Prompting: Artists conceive the desired aesthetic and craft an initial prompt (or use Picasso Assistant to help)
Batch Generation: Generate multiple variations, typically dozens or hundreds of images
Selection: Artists choose the best result as a starting point
Post-Processing: Extensive manual work including cleaning artifacts, expanding composition, upscaling (since models generate at lower resolution for speed), creating layers, and adding brand characters on top
Integration: Embedding the final asset into game UI with appropriate narrative context

A critical insight emerged about process timing: initially, artists followed their traditional workflow—brainstorming session first, then executing on the decided concept. This approach yielded minimal time savings because AI was being used to realize predetermined ideas, where the “surprise factor” became a hindrance. When they reversed the process—generating images first, then holding brainstorming sessions to select and develop the most promising AI outputs—productivity dramatically improved. This process change was as important as the technology itself.

Production Results and Quantified Benefits

Rovio achieved significant measurable improvements in specific use cases:

Season Pass Background Production: Time reduced from 20 days to 4 days (80% reduction). These limited-time special events typically run monthly, featuring seasonal themes and rewards. The AI handles background generation while artists focus on character placement and storytelling elements.

Content Volume Increase: Some artist teams report doubling their content production capacity. One artist noted generating hundreds of illustrations per day in certain cases—a volume that would be impossible with traditional methods.

Velocity and Innovation Balance: While the 80% time saving is impressive, Rovio emphasizes that the greater benefit is enabling innovation. Artists have more time to experiment, explore new concepts, and work on forward-looking projects rather than being consumed by repetitive execution tasks.

The company stresses that these benefits depend on proper tool selection and workflow design. Not every use case sees dramatic improvements, and the technology works best for non-brand-essential assets where some variation and surprise is acceptable.

Operational Challenges and Honest Assessments

The case study is refreshingly candid about challenges and failures, providing valuable lessons for others pursuing similar implementations:

Quality Standards vs. AI Capabilities: The persistent theme is that “pretty good is not good enough” for brand-critical assets. Models that impressed ML engineers were rejected by artists who could identify subtle defects in proportions, expressions, and style consistency. Main brand characters remain difficult to generate to acceptable quality standards because of the extensive style guidelines and high fan expectations.

Adoption Variability: Artists responded differently to the tools. Some loved them immediately, others found them confusing, some refused to try them, and some were disappointed they weren’t “one-click” solutions. Conversely, others were disappointed precisely because they seemed too simple—experienced artists wanted complex tools offering maximum control. This diversity necessitated the multi-tier tool approach.

Control and Frustration: Early attempts with brand characters led to frustrated artists who tried once and quit. The technology works best when use cases are carefully matched to capabilities, requiring thoughtful product management beyond just technical implementation.

Continued Failures and Learning: Even at the time of this presentation, Rovio continues to fail at generating main brand characters to production quality. Their most recent attempts look “really good” to engineers but still have defects visible to artists. However, the team maintains an open-minded approach, continuing to experiment because the technology evolves rapidly.

Cost Management: GPU instances are expensive, requiring careful operational practices like scheduled shutdowns, auto-scaling, and cost tracking. Bedrock’s inference profiles help with cost attribution across different use cases.

Future Exploration and Research

Rovio actively researches emerging capabilities to prepare for potential futures, even when not yet production-ready:

3D Generation: Converting 2D images to 3D models in Rovio style is possible but not yet production-quality for in-game use

Animation: Turning static images into animated sequences, potentially viable for marketing even if not ready for gameplay

Video Generation: Experimenting with video models that can decompose reference videos (depth, pose, masks) and apply Rovio characters, showing promising but not production-ready results

Brand Characters (Continued): Despite repeated failures, ongoing exploration continues because rapid technological progress means previously impossible tasks may become viable

The team emphasizes the importance of remaining open-minded despite past failures and adjusting paths as new research makes current approaches obsolete while enabling new use cases.

LLMOps Maturity and Architectural Patterns

This case study demonstrates several mature LLMOps patterns:

Separation of Training and Inference Infrastructure: SageMaker for training with parallel experimentation, EC2 with GPUs for inference with performance optimization

Artifact Management: Systematic storage of training datasets, model checkpoints, and final artifacts in S3 with version control

Human-in-the-Loop Design: Artists involved in data curation, model selection, prompt crafting, output selection, and post-processing

Multi-Interface Strategy: Different tools for different users and use cases rather than one-size-fits-all

CI/CD for ML: Automated pipelines for deploying new model versions without requiring infrastructure expertise

Cost Optimization: Auto-scaling, scheduled shutdowns, instance family optimization (G6 to G6e migration)

Observability: Metadata collection for usage analytics and performance monitoring

LLM-Augmented Interfaces: Using Bedrock/Claude to make generative AI tools more accessible through conversational assistance

The implementation reflects sophisticated understanding of production ML operations, balancing technical capabilities with user needs, quality requirements, and operational constraints. The honest discussion of failures and limitations provides valuable guidance for organizations attempting similar deployments, emphasizing that successful LLMOps requires much more than training good models—it demands thoughtful product design, change management, and continuous iteration based on real user needs and feedback.

Accelerating Game Asset Creation with Fine-Tuned Diffusion Models

Industry

Technologies