Shopify: Structured AI Workflow Orchestration for Developer Productivity at Scale

LLMOps Database

Tech

Shopify

Company

Shopify

Title

Structured AI Workflow Orchestration for Developer Productivity at Scale

Industry

Tech

Link

https://shopify.engineering/introducing-roast

Year

2025

Summary (short)

Shopify's Augmented Engineering team developed Roast, an open-source workflow orchestration framework that structures AI agents to solve developer productivity challenges like flaky tests and low test coverage. The team discovered that breaking complex AI tasks into discrete, structured steps was essential for reliable performance at scale, leading them to create a convention-over-configuration tool that combines deterministic code execution with AI-powered analysis, enabling reproducible and testable AI workflows that can be version-controlled and integrated into development processes.

## Overview Shopify's Augmented Engineering Developer Experience (DX) team tackled a fundamental challenge in LLMOps: how to make AI agents reliable and effective when working with large codebases and complex developer workflows. Their solution, called Roast, represents a significant advancement in structured AI workflow orchestration, addressing the core problem that "AI agents need help staying on track" and work much better when complex tasks are broken into discrete, manageable steps. The team's motivation came from observing immediate opportunities in developer productivity problems such as flaky tests, low test coverage, and code quality issues that could benefit from AI-powered analysis and remediation at scale. However, they quickly discovered that allowing AI to "roam free around millions of lines of code just didn't work very well," with non-determinism being "the enemy of reliability" in production AI systems. ## Technical Architecture and Design Philosophy Roast implements a convention-over-configuration approach, heavily influenced by Ruby on Rails philosophy, but designed specifically for AI workflow orchestration. The framework uses YAML configuration files and markdown prompts to create structured workflows that interleave non-deterministic AI behavior with deterministic code execution. This hybrid approach represents a sophisticated understanding of where AI excels and where traditional automation is more appropriate. The core technical insight behind Roast is that effective AI workflows require guardrails and structure to function reliably at scale. Rather than treating AI as a black box that can handle any task, the framework recognizes that AI performs best when given clear context, specific objectives, and well-defined boundaries within which to operate. ## Workflow Step Types and Execution Models Roast supports multiple step execution patterns, each optimized for different use cases in AI-powered development workflows: **Directory-based Steps** represent the most common pattern, where each step corresponds to a directory containing a `prompt.md` file. These prompts can use ERB templating to access workflow context, enabling dynamic prompt generation based on previous step outputs and shared workflow state. **Command Execution Steps** allow shell commands to be wrapped and executed within the workflow, with their output captured and made available to subsequent steps. This enables seamless integration between AI analysis and traditional development tools. **Inline Prompt Steps** provide a lightweight mechanism for direct AI model interaction, while **Custom Ruby Steps** enable complex logic implementation through inheritance from BaseStep classes. **Parallel Steps** support concurrent execution of independent workflow components, optimizing performance for workflows with parallelizable tasks. ## Built-in Tool Ecosystem The framework provides a comprehensive toolkit designed specifically for AI-powered development workflows. Tools like ReadFile, WriteFile, UpdateFiles, Grep, SearchFile, Cmd, and Bash provide essential file system and command-line integration capabilities. Each tool includes appropriate security restrictions and error handling mechanisms necessary for production deployment. ## The CodingAgent: Advanced AI Integration The CodingAgent represents Roast's most sophisticated component, providing full integration with Claude Code within structured workflows. This creates a powerful hybrid approach that combines Roast's structural guardrails with Claude Code's adaptive problem-solving capabilities. The CodingAgent excels at iterative tasks that require exploration and adaptation, such as complex code modifications, bug fixing, performance optimization, and adaptive test generation. What makes this integration particularly valuable is how it maintains the benefits of both approaches: developers define clear objectives and boundaries through Roast's structure, while the agent has autonomy within those boundaries to iterate, test, and improve until goals are achieved. This addresses a key challenge in production AI systems where pure deterministic automation is insufficient but unconstrained AI behavior is unreliable. ## Context Management and Data Flow Roast implements sophisticated context sharing mechanisms where steps in a workflow share their conversation transcripts, building upon each other's discoveries automatically. This enables complex workflows where later steps can reference and build upon earlier analysis without requiring explicit configuration from workflow authors. The system maintains conversation state across the entire workflow execution, creating a form of persistent memory that enables more sophisticated AI reasoning patterns. ## Advanced Control Flow Capabilities The framework supports sophisticated control structures including iteration over collections, conditional execution based on step outcomes, and case statements for multi-branch logic. These capabilities enable complex decision-making patterns within AI workflows, allowing for adaptive behavior based on analysis results and environmental conditions. ## Session Replay and Development Experience One of Roast's most valuable production features is session replay, where every workflow execution is automatically saved and can be resumed from any step. This dramatically reduces development iteration time by eliminating the need to rerun expensive AI operations during workflow development and debugging. For production systems dealing with costly AI API calls, this feature provides significant operational benefits. ## Production Implementation Examples Shopify has deployed Roast across multiple production use cases that demonstrate its versatility and effectiveness: **Test Quality Analysis at Scale** involves automated analysis of thousands of test files, identifying antipatterns and increasing test coverage across the codebase. This represents a classic example of AI-powered code quality improvement that would be impractical to implement manually. **Automated Type Safety with "Boba"** demonstrates sophisticated multi-step workflows where deterministic cleanup operations are followed by AI-powered problem-solving. The workflow performs initial cleanup with sed, applies Sorbet type annotations, runs autocorrect tools, and then uses the CodingAgent to resolve remaining typing issues that require iterative problem-solving. **Proactive Site Reliability Monitoring** showcases AI-powered pattern recognition applied to operational data, where workflows analyze Slack conversations to identify emerging issues before they escalate into incidents. This represents a sophisticated application of AI to operational intelligence. **Competitive Intelligence Aggregation** demonstrates complex data synthesis workflows that gather information from multiple sources and use AI to create actionable intelligence reports, replacing hours of manual research with automated analysis. **Historical Code Context Analysis** provides developers with AI-powered research capabilities that analyze commit history and PR context to explain the reasoning behind seemingly puzzling code decisions, preventing inappropriate code removal. ## Technical Infrastructure and Integration Roast is built on Raix (Ruby AI eXtensions), which provides an abstraction layer for AI interactions. This foundation enables sophisticated features like retry logic, response caching, structured output handling, and support for multiple AI providers. The integration allows fine-grained control over AI interactions including custom authentication, request retry strategies, response logging, provider-specific configurations, and token usage tracking. ## Operational Considerations and Limitations While the blog post presents Roast as a significant advancement, several operational considerations should be noted. The framework requires Ruby 3.0+ and API keys for AI services, creating infrastructure dependencies. The reliance on external AI services introduces potential availability and cost considerations for production deployments. The session replay feature, while valuable for development, could create storage overhead for large-scale production deployments. Token usage tracking becomes particularly important given the potentially high costs associated with complex AI workflows. ## Innovation in AI Workflow Development Perhaps most significantly, Roast enables a new development paradigm where developers can "handwave a step I don't quite know how to do yet with an AI approximation that mostly works" and then replace AI steps with deterministic code as understanding improves. This represents a fundamental shift from traditional automation development where complete problem understanding is required before implementation. This approach allows for rapid prototyping of complex automation workflows while maintaining a clear path toward production reliability. It addresses the common challenge in AI system development where perfect solutions are delayed by the need to solve every edge case upfront. ## Industry Impact and Future Implications Shopify's open-sourcing of Roast suggests their belief that structured AI workflows will become as essential to modern development as CI/CD pipelines. The framework represents a maturing understanding of how to effectively integrate AI capabilities into production development workflows while maintaining reliability and predictability. The approach demonstrates sophisticated thinking about the appropriate division of labor between AI and traditional automation, recognizing that the most effective systems combine both approaches strategically rather than treating them as alternatives. This hybrid model may represent the future direction of production AI systems across industries. ## Assessment and Technical Evaluation From a technical perspective, Roast addresses real challenges in production AI deployment. The emphasis on structure and guardrails reflects practical experience with AI system reliability issues. The session replay feature demonstrates understanding of AI development workflow optimization. The integration with multiple AI providers through Raix shows architectural sophistication. However, as with any vendor-promoted case study, some claims should be evaluated carefully. The productivity improvements and scale benefits described are self-reported and may not generalize to all organizations or use cases. The framework's Ruby-centric design may limit adoption in organizations using different technology stacks, though the CLI interface provides some language independence. The success stories presented are compelling but represent Shopify's specific context and requirements. Organizations considering adoption should carefully evaluate whether their use cases align with Roast's strengths and whether the operational overhead is justified by the expected benefits.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source