ZenML

Dogfooding AI Features in GitLab's Development Workflow

Gitlab 2024
View original source

GitLab shares their experience of integrating and testing their AI-powered features suite, GitLab Duo, within their own development workflows. The case study demonstrates how different teams within GitLab leverage AI capabilities for various tasks including code review, documentation, incident response, and feature testing. The implementation has resulted in significant efficiency gains, reduced manual effort, and improved quality across their development processes.

Industry

Tech

Technologies

Overview

This case study from GitLab documents how the company internally uses its own AI-powered feature suite, GitLab Duo, across various engineering and product teams. The practice of “dogfooding”—using one’s own products—is a common approach in tech companies, and GitLab applies this to test and demonstrate the value of their AI capabilities before and alongside customer adoption. The case study is part of a broader blog series aimed at showcasing how GitLab creates, tests, and deploys AI features integrated throughout the enterprise DevSecOps platform.

It is important to note that this case study is inherently promotional, coming directly from GitLab’s marketing and product teams. While it provides useful insights into how AI tools can be integrated into developer workflows, readers should approach the claimed benefits with appropriate skepticism, as specific quantitative metrics are largely absent from the discussion.

GitLab Duo Feature Suite

GitLab Duo encompasses multiple AI-powered capabilities designed to assist developers and other team members throughout the software development lifecycle. The key features highlighted in this case study include:

Production Use Cases and Integration Patterns

Code Review and Development Workflows

The case study describes how Staff Backend Developer Gosia Ksionek uses GitLab Duo to streamline code review processes. The AI summarizes merge requests, making it faster to review code changes, and answers coding questions while explaining complex code snippets. This represents a common LLMOps pattern where AI is integrated directly into developer tooling to reduce cognitive load during code review.

Senior Frontend Engineer Peter Hegman reportedly uses Code Suggestions for full-stack JavaScript and Ruby development, demonstrating the tool’s ability to work across different programming languages and frameworks. This multi-language support is important for production AI tools in heterogeneous development environments.

Documentation and Content Generation

Several use cases focus on using LLMs for documentation and content generation tasks:

Taylor McCaslin, Group Manager for the Data Science Section, used GitLab Duo to create documentation for GitLab Duo itself—a meta use case that the company highlights as demonstrating the tool’s utility. Staff Technical Writer Suzanne Selhorn used the AI to optimize documentation site navigation by providing a workflow-based ordering of pages and drafting Getting Started documentation more quickly than manual approaches.

Senior Product Manager Amanda Rueda uses GitLab Duo to craft release notes, employing specific prompts like requesting “a two sentence summary of this change, which can be used for our release notes” with guidance on tone, perspective, and value proposition. This prompt engineering approach is a practical example of how production AI tools can be customized for specific content generation tasks through carefully crafted prompts.

Administrative and Communication Tasks

The case study highlights non-coding applications of the AI tools. Engineering Manager François Rosé uses Duo Chat for drafting and refining OKRs (Objectives and Key Results), providing example prompts that request feedback on objective and key result formulations. Staff Frontend Engineer Denys Mishunov used Chat to formulate text for email templates used in technical interview candidate communications.

These use cases demonstrate that LLM-powered tools in production environments often extend beyond purely technical tasks into administrative and communication workflows.

Incident Response and DevOps

Staff Site Reliability Engineer Steve Xuereb employs GitLab Duo to summarize production incidents and create detailed incident reviews. He also uses Chat to create boilerplate .gitlab-ci.yml files, which reportedly speeds up workflow significantly. The Code Explanation feature provides detailed answers during incidents, enhancing productivity and understanding of the codebase during time-critical situations.

This incident response use case is particularly relevant to LLMOps, as it demonstrates AI assistance in operational contexts where speed and accuracy are critical.

Testing and Quality Assurance

Senior Developer Advocate Michael Friedrich uses GitLab Duo to generate test source code for CI/CD components, sharing this approach in talks and presentations. The case study mentions that engineers test new features like Markdown support in Code Suggestions internally before release, using GitLab Duo for writing blog posts and documentation in VS Code.

External Codebase Understanding

The /explain feature is highlighted as particularly useful for understanding external projects imported into GitLab. This capability was demonstrated during a livestream with open source expert Eddie Jaoude, showcasing how AI can help developers quickly understand unfamiliar codebases, dependencies, and open source projects.

Claimed Benefits and Critical Assessment

GitLab claims several benefits from integrating GitLab Duo:

However, these claims warrant scrutiny. The case study provides anecdotal evidence and user testimonials but lacks specific quantitative metrics such as percentage improvements in cycle time, reduction in bugs, or time savings measurements. The mention of an “AI Impact analytics dashboard” suggests GitLab is developing metrics capabilities, but concrete data from this dashboard is not provided in this case study.

The self-referential nature of the case study—a company promoting its own products using internal testimonials—means that the evidence should be considered accordingly. Real-world enterprise adoption and independent benchmarks would provide more reliable validation of the claimed benefits.

Technical Implementation Considerations

While the case study does not delve deeply into technical architecture, several LLMOps-relevant aspects can be inferred:

The mention of validating and testing AI models at scale in related blog posts suggests GitLab has developed internal infrastructure for model evaluation, though details are not provided in this specific case study.

Conclusion

This case study provides a useful window into how a major DevOps platform company integrates AI capabilities throughout their internal workflows. The breadth of use cases—from code generation to documentation to incident response—demonstrates the versatility of LLM-powered tools in production software development environments. However, the promotional nature of the content and absence of quantitative metrics mean the claimed benefits should be viewed as indicative rather than definitive. The case study is most valuable as a catalog of potential AI integration points in software development workflows rather than as proof of specific productivity improvements.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building a Multi-Agent Research System for Complex Information Tasks

Anthropic 2025

Anthropic developed a production multi-agent system for their Claude Research feature that uses multiple specialized AI agents working in parallel to conduct complex research tasks across web and enterprise sources. The system employs an orchestrator-worker architecture where a lead agent coordinates and delegates to specialized subagents that operate simultaneously, achieving 90.2% performance improvement over single-agent systems on internal evaluations. The implementation required sophisticated prompt engineering, robust evaluation frameworks, and careful production engineering to handle the stateful, non-deterministic nature of multi-agent interactions at scale.

question_answering document_processing data_analysis +48

Building Production-Ready AI Agent Systems: Multi-Agent Orchestration and LLMOps at Scale

Galileo / Crew AI 2025

This podcast discussion between Galileo and Crew AI leadership explores the challenges and solutions for deploying AI agents in production environments at enterprise scale. The conversation covers the technical complexities of multi-agent systems, the need for robust evaluation and observability frameworks, and the emergence of new LLMOps practices specifically designed for non-deterministic agent workflows. Key topics include authentication protocols, custom evaluation metrics, governance frameworks for regulated industries, and the democratization of agent development through no-code platforms.

customer_support code_generation document_processing +41