ZenML

Internal AI Agent Platform for Enterprise Data Access and Product Development

Amplitude 2025
View original source

Amplitude built an internal AI agent called "Moda" that provides company-wide access to enterprise data through Slack and web interfaces, enabling employees to query business information, generate insights, and create product requirements documents (PRDs) with prototypes. The tool was developed by engineers in their spare time over 3-4 weeks and achieved viral adoption across the company within a week of launch, demonstrating how organizations can rapidly build custom AI tools to accelerate product development workflows and democratize data access across teams.

Industry

Tech

Technologies

Amplitude, a product analytics company, developed an innovative internal AI agent platform called “Moda” that exemplifies practical LLMOps implementation at scale. The case study, presented by Wade Chambers, Chief Engineering Officer at Amplitude, demonstrates how a technology company can rapidly build and deploy custom AI tools to transform internal workflows and democratize data access across the organization.

Company Background and Problem Context

Amplitude was facing the common enterprise challenge of making their vast internal data accessible and actionable for employees across different roles and technical skill levels. The company had accumulated substantial amounts of data across various systems including Confluence, Jira, Salesforce, Zendesk, Slack, Google Drive, Product Board, Zoom, Outreach, Gmail, Asana, Dropbox, GitHub, and HubSpot. However, this data was siloed and difficult to access, requiring employees to manually search through multiple systems to find relevant information for decision-making and product development.

The inspiration for their solution came from observing how different employees were at varying levels of AI fluency, with some using ChatGPT or Claude at home while others had no experience with AI tools. Chambers recognized the need to create a “common language” and fluency around AI within the organization, while also providing a way for employees to learn from each other’s AI interactions.

Technical Architecture and Implementation

Moda was built using a custom framework called “Langley,” which Amplitude developed internally to process AI requests. The technical stack demonstrates several key LLMOps principles and technologies:

Enterprise Search Integration: The system leverages the Glean API as its primary enterprise search backbone, which significantly simplified the data access and permissions management challenges typically associated with enterprise RAG implementations. This architectural decision allowed the team to focus on the user experience and workflow automation rather than building complex data ingestion pipelines from scratch.

Multi-Interface Deployment: The platform was designed with two primary interfaces - a Slack bot for casual, social interactions and a dedicated web application for more complex workflows. This dual-interface approach was strategically chosen to maximize adoption through what Chambers calls “social engineering” - allowing employees to see each other’s queries and learn from successful interactions.

Agent Orchestration: The system implements sophisticated multi-step workflows, particularly evident in their PRD (Product Requirements Document) generation process. The PRD workflow demonstrates advanced prompt engineering and agent orchestration, breaking down complex document creation into sequential steps including problem exploration, solution exploration, detailed requirements generation, and prototype creation prompts.

Configuration-Driven Architecture: The system uses YAML configuration files to define different agent behaviors and workflows, making it easily extensible and maintainable. This approach allows non-engineers to contribute to the system’s capabilities and customize workflows for specific use cases.

Deployment and Adoption Strategy

The deployment strategy reveals sophisticated thinking about organizational change management in AI adoption. Rather than a traditional rollout, Amplitude employed what Chambers describes as “social engineering” - making AI interactions visible in Slack channels so employees could observe and learn from each other’s successful queries and results.

The adoption timeline was remarkably rapid: after one engineer got the Slack bot working on a Friday afternoon and it was pushed live the following Monday, the tool achieved company-wide adoption within a week. This viral adoption pattern suggests that the social visibility of AI interactions was crucial to overcoming typical enterprise AI adoption barriers.

Advanced Use Cases and Workflows

Thematic Analysis and Customer Insights: One of the most sophisticated use cases demonstrated was the ability to perform thematic analysis across multiple data sources. The system can analyze the top themes from recent Slack messages, customer support tickets, sales call transcriptions, and product feedback to identify trends and opportunities. This represents a significant advancement in automated qualitative data analysis for product management.

PRD Generation and Prototyping: Perhaps the most impressive workflow is the automated PRD generation process, which takes a single sentence description of a customer need and expands it into comprehensive product requirements. The system breaks this down into multiple stages: problem exploration, solution exploration, detailed requirements, and prototype generation instructions. The generated prototypes can then be implemented using various tools like Bolt, Figma, or v0, with the same prompt generating different variations for comparison.

Cross-Functional Workflow Compression: The platform enables what Chambers describes as compressing traditional multi-week product development workflows into single meetings. Previously, user research, PRD creation, design mockups, and engineering assessment would take weeks. Now these can be accomplished in a single session using Moda to gather evidence, generate requirements, and create prototypes.

Operational Management and Governance

The operational model for Moda demonstrates mature LLMOps practices. The entire system is version-controlled in GitHub, with most engineers, product managers, and designers able to contribute through standard pull request workflows. This democratized contribution model extends beyond traditional engineering roles, with designers and product managers actively contributing to the system’s development.

The team employs an interesting approach to quality assurance through iterative prompt refinement and multi-perspective validation. When PRDs are generated, they go through review processes where teams evaluate problem statements, solution approaches, and consider alternative solutions. The system is designed to handle feedback loops, allowing users to comment on generated content and request regeneration of specific sections.

Monitoring and Improvement: Amplitude uses Moda to monitor its own usage and effectiveness, demonstrating a meta-level application of their AI system. Chambers shows how they query the system to understand usage patterns across different teams, identify friction points, and optimize performance based on actual usage data.

Technical Challenges and Solutions

Prompt Engineering: The system demonstrates sophisticated prompt engineering practices, with well-structured, sequential instructions for complex workflows. The team addresses the common challenge of prompt quality by using AI itself to improve prompts recursively - asking AI systems to generate better prompts for specific use cases.

Multi-Step Processing: The PRD generation workflow showcases advanced agent orchestration with parallel and sequential tool calls. The system can handle complex, multi-step processes without requiring constant user interaction, representing a more mature approach to AI workflow automation than simple chat interfaces.

Data Scope and Permissions: While leveraging Glean API for enterprise search, the system maintains appropriate data access controls, explicitly excluding private, personal, or restricted datasets while providing access to enterprise-public information across the organization.

Cross-Role Collaboration and Skill Development

One of the most innovative aspects of Amplitude’s implementation is their approach to cross-functional collaboration. They conducted experiments where team members swapped roles - designers became engineers, engineers became product managers, and product managers became designers - all facilitated by AI tools. This demonstrates how LLMOps can break down traditional role boundaries and increase organizational flexibility.

This role-swapping exercise served multiple purposes: building empathy between different functions, demonstrating the accessibility of AI-assisted work across disciplines, and identifying opportunities for improved workflows and tooling.

Performance and Impact Assessment

The impact on organizational velocity appears significant, though Chambers is appropriately measured in his claims. The system has compressed traditional product development timelines from weeks to single meetings in many cases, though he notes that complex products or deeply integrated features still require traditional design and engineering processes.

The social adoption model has proven particularly effective, with employees discovering the tool through observing colleagues’ successful interactions in Slack channels. This organic discovery mechanism has been more effective than traditional training or rollout approaches.

Quality Considerations: Despite the automation, Amplitude maintains emphasis on critical evaluation of AI-generated content. Teams are expected to validate AI outputs, consider alternative solutions, and apply human judgment to ensure quality. This balanced approach acknowledges both the power and limitations of current AI capabilities.

Broader Implications for LLMOps

Amplitude’s implementation demonstrates several important principles for successful LLMOps deployment:

Build vs. Buy Strategy: Their decision to build internally rather than purchasing off-the-shelf solutions was driven by the need for deep integration with existing data sources and workflows. The relatively small investment (3-4 weeks of part-time engineering work) made this a low-risk, high-reward approach.

Social Architecture: The emphasis on visible, social AI interactions as a learning and adoption mechanism offers a replicable model for other organizations struggling with AI adoption.

Iterative Development: The system is designed for continuous improvement, with easy modification and extension capabilities that allow for rapid iteration based on user feedback and changing needs.

Role Democratization: By making AI tools accessible across different roles and skill levels, organizations can increase overall productivity while building AI fluency throughout the workforce.

The Amplitude case study represents a sophisticated, practical implementation of LLMOps principles that goes beyond simple chatbot deployment to create genuine workflow transformation and organizational capability enhancement. Their approach offers a roadmap for other technology companies looking to build internal AI capabilities while managing the challenges of enterprise data access, user adoption, and cross-functional collaboration.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Reinforcement Learning for Code Generation and Agent-Based Development Tools

Cursor 2025

This case study examines Cursor's implementation of reinforcement learning (RL) for training coding models and agents in production environments. The team discusses the unique challenges of applying RL to code generation compared to other domains like mathematics, including handling larger action spaces, multi-step tool calling processes, and developing reward signals that capture real-world usage patterns. They explore various technical approaches including test-based rewards, process reward models, and infrastructure optimizations for handling long context windows and high-throughput inference during RL training, while working toward more human-centric evaluation metrics beyond traditional test coverage.

code_generation code_interpretation data_analysis +61

Scaling AI Product Development with Rigorous Evaluation and Observability

Notion 2025

Notion AI, serving over 100 million users with multiple AI features including meeting notes, enterprise search, and deep research tools, demonstrates how rigorous evaluation and observability practices are essential for scaling AI product development. The company uses Brain Trust as their evaluation platform to manage the complexity of supporting multilingual workspaces, rapid model switching, and maintaining product polish while building at the speed of AI industry innovation. Their approach emphasizes that 90% of AI development time should be spent on evaluation and observability rather than prompting, with specialized data specialists creating targeted datasets and custom LLM-as-a-judge scoring functions to ensure consistent quality across their diverse AI product suite.

document_processing content_moderation question_answering +52