Instacart: Building and Scaling an Enterprise AI Assistant with GPT Models

LLMOps Database

E-commerce

Instacart

Company

Instacart

Title

Building and Scaling an Enterprise AI Assistant with GPT Models

Industry

E-commerce

Link

https://tech.instacart.com/scaling-productivity-with-ava-instacarts-internal-ai-assistant-ed7f02558d84

Year

2023

Summary (short)

Instacart developed Ava, an internal AI assistant powered by GPT-4 and GPT-3.5, which evolved from a hackathon project to a company-wide productivity tool. The assistant features a web interface, Slack integration, and a prompt exchange platform, achieving widespread adoption with over half of Instacart employees using it monthly and 900 weekly users. The system includes features like conversation search, automatic model upgrades, and thread summarization, significantly improving productivity across engineering and non-engineering teams.

Tags

## Summary Instacart, a leading grocery delivery and e-commerce platform, built an internal AI assistant called Ava powered by OpenAI's GPT-4 and GPT-3.5 models. The project began as a hackathon initiative and evolved into an enterprise-wide productivity tool that achieved remarkable adoption rates, with over half of Instacart's employees using it monthly and more than 900 employees using it weekly. This case study illustrates the journey from prototype to production-ready internal LLM tool, including the product decisions, feature development, and deployment strategies that drove adoption across both technical and non-technical teams. ## Origins and Initial Development The project originated during a company-wide hackathon in early 2023. The engineering team discovered that ChatGPT, specifically GPT-4, significantly accelerated their development velocity, enabling them to produce nearly twice as many features as initially planned. This experience with AI-assisted development for brainstorming, coding, debugging, and test generation led to the decision to provide ChatGPT-like access to all developers as quickly as possible. A key enabling factor was Instacart's close relationship with OpenAI, which provided early access to GPT-4 (including the 32K context model) through APIs with custom data privacy, security, and quota guarantees. This access to enterprise-grade APIs with appropriate security controls was essential for deploying an LLM-powered tool internally. The team leveraged these APIs to rapidly build and launch Ava for their engineering organization. ## Technical Architecture and Model Selection Ava is built on OpenAI's GPT-4 and GPT-3.5 models, utilizing their API infrastructure rather than self-hosted models. The system includes automatic model upgrades between GPT-4 variants as conversation context grows, suggesting a dynamic model selection strategy based on context window requirements. This approach allows the system to balance cost and capability, potentially using smaller models for simpler conversations while scaling to larger context windows (32K) when needed for tasks like reviewing full code files or summarizing lengthy documents. The architecture supports multiple interfaces including a web application and Slack integration, indicating a service-oriented backend that can serve various frontend clients. The mention of plans to expose Ava's APIs company-wide suggests a modular design that separates the core LLM orchestration layer from the user-facing applications. ## Feature Development for Engineer Adoption The initial launch prioritized features specifically valuable to engineers. These included convenient keyboard shortcuts for efficient interaction, single-click code copying to reduce friction when transferring generated code to development environments, and automatic upgrades between GPT-4 models as conversation context grew. These features addressed common pain points in developer workflows when working with AI assistants. Post-launch metrics showed strong engagement patterns, with users spending 20+ minutes per session and producing and copying significant amounts of code with Ava as a companion. Developers leveraged the largest GPT-4 context model for creating, debugging, and reviewing full code files, as well as summarizing documents and asking follow-up questions. This demonstrates that the long-context capabilities of GPT-4-32K were being actively utilized for real development tasks. ## Expansion Beyond Engineering After observing strong adoption among engineers, Instacart identified demand from other departments including Operations, Recruiting, Brand Marketing, and HR. This cross-functional interest required a shift in product strategy from developer-centric features toward more general-purpose usability. The team recognized that the blank text box interface presented a barrier to entry for non-technical users who might not know how to craft effective prompts. To address this, they introduced "templates" - pre-crafted prompts that allowed users to quickly start conversations without needing prompt engineering expertise. This approach to democratizing LLM access by providing structured starting points is a common pattern in enterprise LLM deployments. Additional features added for broader accessibility included full-text conversation search for finding previous interactions, and conversation sharing capabilities that allowed users to share their Ava conversations with colleagues. The team also implemented Slack "unfurling" for shared conversation links, which provides users with a preview of the conversation content before deciding to follow the link. This attention to the social and collaborative aspects of AI tool usage contributed to product awareness and adoption. ## The Prompt Exchange A significant product innovation was the Ava Prompt Exchange, a library of user-created prompts that became available after the organization-wide rollout. Rather than having the small engineering team create templates for every department's use cases (which would have been impractical given their lack of domain expertise), they enabled users to create, share, and discover prompts based on their own needs and experience. The Prompt Exchange allows users to browse popular prompts, search for specific use cases, create their own prompts for others, and star prompts for later access. This crowdsourced approach to prompt management represents an interesting LLMOps pattern for enterprise deployments - essentially treating prompts as a form of institutional knowledge that can be curated and shared across the organization. It shifts prompt engineering from a centralized function to a distributed, community-driven activity. ## Slack Integration and Multi-Channel Deployment Recognizing that navigating to a dedicated web page created friction for quick tasks, the team built a Slack integration to make Ava accessible within existing workflows. Rather than creating a reduced-feature clone of the web experience, they focused on identifying features particularly valuable within the Slack context. The "Fast Breakdown" template, which summarizes conversations into facts, open questions, and action items, had already proven popular on the web interface for summarizing meeting notes, emails, and Slack conversations. This became a first-class feature in the Slack app, allowing users to simply type "@Ava summarize" to get a summary of a thread or channel. The summary is posted publicly, enabling other participants to verify accuracy and note corrections - an interesting approach to maintaining quality and trust in AI-generated summaries. The Slack integration also supports normal chatbot functionality in both DMs and channels, with Ava having access to conversation context to infer answers and participate naturally. The team emphasized making the user experience feel similar to chatting with a colleague, prioritizing natural interaction patterns over technical complexity. ## Adoption Metrics and Success Indicators The case study reports strong adoption metrics: over half of Instacart employees use Ava monthly, and more than 900 use it weekly. By the time of the broader organizational release, nearly a third of the organization was already using Ava monthly. These numbers suggest successful enterprise-wide adoption of an LLM-powered tool, though the case study does not provide detailed productivity metrics or quantified business impact. Users report using Ava for writing tasks, code review and debugging, improving communications, faster learning, and building AI-enabled internal tools on top of Ava's APIs. The mention of 20+ minute sessions suggests deep engagement rather than superficial usage. ## Future Development and Roadmap The team outlined several planned areas of investment. They identified knowledge retrieval and code execution as priorities, acknowledging that the "Achilles' heel of LLMs is the data they are trained/tuned on or have access to." This suggests plans to implement RAG (Retrieval-Augmented Generation) capabilities that would give Ava access to Instacart's internal knowledge bases and potentially enable more sophisticated code execution workflows. The team also plans to expose Ava's APIs company-wide, allowing other teams at Instacart to integrate AI capabilities into their own workflows and processes. This platform approach to internal LLM tooling could enable more specialized applications while leveraging centralized infrastructure, security controls, and model management. Additional use cases mentioned include enhanced debugging and code review capabilities, meeting enhancement, and incident management. These suggest a roadmap focused on integrating Ava more deeply into operational workflows rather than keeping it as a standalone conversational tool. ## LLMOps Considerations This case study illustrates several important LLMOps patterns for enterprise deployment. The use of OpenAI's enterprise APIs with custom data privacy, security, and quota guarantees addresses common concerns about deploying LLMs with sensitive internal data. The automatic model selection based on context requirements demonstrates thoughtful cost and capability management. The Prompt Exchange represents an innovative approach to managing and sharing prompts across an organization, treating prompt engineering as a collaborative rather than centralized function. The multi-channel deployment (web and Slack) with feature parity considerations shows the importance of meeting users where they work. The focus on reducing friction through keyboard shortcuts, one-click copying, and contextual templates reflects lessons about driving adoption of AI tools. However, the case study notably lacks discussion of evaluation frameworks, testing strategies, model monitoring, or how they handle model updates and potential regressions - areas that would be valuable to understand for a complete LLMOps picture. It's worth noting that this case study comes from Instacart's engineering blog and presents their internal tool in a positive light. While the adoption metrics are impressive, the absence of productivity quantification (as noted in reader comments) and lack of discussion about challenges, failures, or ongoing operational concerns means this should be viewed as a success story that may not fully represent the complexity of operating such systems at scale.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source