Monday.com: Building a Digital Workforce with Multi-Agent Systems and User-Centric Design

LLMOps Database

Tech

Monday.com

Company

Monday.com

Title

Building a Digital Workforce with Multi-Agent Systems and User-Centric Design

Industry

Tech

Link

https://www.youtube.com/watch?v=P8ewpJrZVwo

Year

2025

Summary (short)

Monday.com built a digital workforce of AI agents to handle their billion annual work tasks, focusing on user experience and trust over pure automation. They developed a multi-agent system using LangGraph that emphasizes user control, preview capabilities, and explainability, achieving 100% month-over-month growth in AI usage. The system includes specialized agents for data retrieval, board actions, and answer composition, with robust fallback mechanisms and evaluation frameworks to handle the 99% of user interactions they can't initially predict.

## Monday.com Digital Workforce Case Study Monday.com, a publicly traded work operating system company that recently crossed $1 billion ARR, has implemented a comprehensive digital workforce strategy using AI agents to tackle their massive scale of 1 billion work tasks processed annually. The company, led by Head of AI Assaf, launched their first AI feature in September and has experienced remarkable growth of 100% month-over-month in AI usage, demonstrating significant traction in their AI initiatives. The core philosophy behind Monday.com's approach centers on the principle that "the biggest barrier to adoption is trust, not technology." This insight fundamentally shaped their product design decisions and technical architecture. Rather than pursuing maximum automation, they prioritized user experience and control, recognizing that different companies and users have varying risk appetites when it comes to AI-powered automation. ### Key Design Principles and User Experience Focus Monday.com's approach to building AI agents emphasizes four critical components that have proven essential for adoption. First, they implemented granular autonomy controls, allowing users to determine how much control they want to give their agents rather than defaulting to full automation. This design choice acknowledges that while engineers may prefer autonomous systems, real-world users need varying levels of control based on their risk tolerance and specific use cases. Second, they focused on seamless entry points by integrating agents into existing workflows rather than creating entirely new user experiences. Since users already assign human workers to tasks within the Monday.com platform, they simply extended this familiar pattern to include digital workers or agents. This approach eliminates the need for users to learn new habits and reduces friction in adoption. The third critical component involves preview capabilities with human-in-the-loop functionality. Initially, when users could directly ask agents to create boards, projects, or modify items, Monday.com discovered that users would freeze when it came time to actually push content to production boards. The company drew an analogy to Cursor AI, noting that while Cursor shows developers the code before implementation, imagine if it pushed directly to production without review - adoption would likely plummet. By implementing preview functionality, they gave users confidence and control over outcomes before changes were committed, resulting in significant adoption increases. Finally, they emphasized explainability not merely as a nice-to-have feature, but as a learning mechanism. When users understand why certain outputs were generated, they can actively improve their interactions with the AI over time, creating a feedback loop that enhances the overall experience. ### Technical Architecture and Framework Selection Monday.com built their entire agent ecosystem on LangGraph and LangSmith, having evaluated various frameworks and finding LangGraph superior for their needs. The framework provides essential capabilities like interrupts, checkpoints, persistent memory, and human-in-the-loop functionality without being overly opinionated, allowing for significant customization while handling complex orchestration requirements that engineers prefer not to manage manually. The technical architecture centers around LangGraph as the core engine, supplemented by several custom-built components. They developed AI blocks, which are internal AI actions specific to Monday.com's ecosystem. Recognizing evaluation as critical for production AI systems, they built their own evaluation framework rather than relying on external solutions. Additionally, they implemented an AI gateway to control what inputs and outputs are permitted within the system, providing an additional layer of governance and security. The system now processes millions of requests per month, demonstrating the scalability of their LangGraph-based architecture in a production environment serving a large user base. ### Multi-Agent System Design: The Monday Expert Their flagship digital worker, called the Monday Expert, exemplifies their multi-agent approach using a supervisor methodology. The system consists of four specialized agents working in coordination. The supervisor agent orchestrates the overall workflow and decision-making. A data retrieval agent handles information gathering across Monday.com's ecosystem, including knowledge base searches, board data access, and web search capabilities. The board actions agent executes actual modifications and actions within the Monday.com platform. Finally, an answer composer agent generates responses based on user context, conversation history, tone of voice preferences, and other user-defined parameters. An innovative feature they've implemented is an undo capability, where the supervisor can dynamically decide what actions to reverse based on user feedback. This functionality has proven particularly valuable for building user confidence and handling edge cases where actions don't meet user expectations. ### Production Challenges and Lessons Learned Monday.com's experience reveals several critical insights about deploying conversational AI agents in production. They discovered that approximately 99% of user interactions fall outside what they initially anticipated and prepared for, highlighting the infinite nature of natural language interactions compared to the finite set of scenarios typically handled in development. To address this challenge, they implemented robust fallback mechanisms. When the system detects that a user is requesting an action it cannot handle, it searches the knowledge base and provides instructions for manual completion. This approach maintains user productivity even when the AI cannot directly fulfill requests. Their evaluation framework represents a core intellectual property asset, as they recognize that while models and technology will continue evolving rapidly, strong evaluation capabilities provide sustainable competitive advantages. This perspective reflects a mature understanding of the AI landscape where the ability to assess and improve AI performance becomes more valuable than any specific model or technology choice. Human-in-the-loop functionality proved critical for bridging the gap between development confidence and production readiness. The team experienced the common challenge where systems that perform well in controlled environments require significant additional work to reach production quality. They emphasize that reaching 80% accuracy is relatively straightforward, but achieving the 99%+ reliability needed for production deployment can take substantially longer. ### Guardrails and System Reliability Monday.com implements guardrails outside of the LLM rather than relying on the model itself for constraint enforcement. They cite Cursor AI as an excellent example of external guardrails, noting how it stops after 25 coding runs regardless of success status. This external control mechanism provides more reliable constraint enforcement than depending on LLM-based judgment systems. One particularly interesting technical insight involves what they term "compound hallucination" in multi-agent systems. While specialized agents typically perform better than generalist systems, having too many agents creates a mathematical problem where errors compound. If each agent operates at 90% accuracy, a chain of four agents results in overall system accuracy of approximately 66% (0.9^4). This insight challenges the assumption that more specialized agents always improve system performance and highlights the need for careful balance in multi-agent system design. ### Future Vision: Dynamic Orchestration Monday.com's long-term vision involves moving beyond static workflows to dynamic orchestration systems. They illustrate this concept using their quarterly earnings report process, which requires gathering extensive data and narratives across the company. While they could build a comprehensive automated workflow for this process, it would only run once per quarter, making it difficult to maintain and improve as AI technology evolves rapidly. Their proposed solution involves creating a finite set of specialized agents capable of handling infinite tasks through dynamic orchestration. This approach mirrors human work organization, where individuals with specialized skills are dynamically assembled into teams for specific projects and then dissolved when complete. Their vision includes dynamic workflows with dynamic edges, rules, and agent selection that automatically assemble for specific tasks and then disband. This orchestration approach represents a significant technical challenge but offers the potential for more resilient and adaptable AI systems that can evolve with changing requirements and improving technology. ### Market Strategy and Adoption Metrics Monday.com is opening their agent marketplace to external developers, indicating confidence in their platform's capabilities and a strategy to leverage external innovation. With their processing of 1 billion tasks annually, they see significant opportunity for agent-powered automation across their user base. The company's 100% month-over-month growth in AI usage, while impressive, should be interpreted carefully as early-stage metrics often show dramatic percentage growth from small initial bases. However, the sustained growth pattern and their willingness to invest heavily in AI infrastructure suggests genuine user value and adoption. ### Technical Considerations and Balanced Assessment While Monday.com's approach demonstrates several sophisticated technical and product design decisions, some aspects warrant careful consideration. Their emphasis on user control and preview capabilities, while beneficial for adoption, may limit the efficiency gains possible from AI automation. The trade-off between user trust and system autonomy represents a fundamental tension in AI product design. Their custom evaluation framework and AI gateway represent significant engineering investments that may not be necessary for smaller organizations or different use cases. The decision to build rather than buy these components reflects their scale and specific requirements but may not be optimal for all situations. The compound hallucination insight, while mathematically sound, assumes independence between agent errors, which may not hold in practice. Correlated errors or feedback mechanisms between agents could significantly alter the mathematical relationship they describe. Their vision for dynamic orchestration, while compelling, represents a significant technical challenge that may prove more complex in practice than their conceptual description suggests. The coordination overhead and potential failure modes in dynamic multi-agent systems could offset some of the proposed benefits. Overall, Monday.com's approach represents a mature, user-centric strategy for deploying AI agents in production environments, with particular strength in user experience design and practical consideration of real-world deployment challenges. Their technical architecture choices appear well-suited to their scale and requirements, though some elements may not generalize to different contexts or organizational needs.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source