Wakam: Enterprise-Scale AI Agent Deployment in Insurance

Overview

Wakam’s case study represents a comprehensive example of enterprise LLMOps implementation at scale, demonstrating the journey from experimental AI prototypes to production-grade agent deployment with exceptional organizational adoption. As a B2B2C insurance company operating across 30+ countries with nearly one billion euros in turnover, Wakam faced the classic enterprise challenge of knowledge trapped in organizational silos, which they addressed through a strategic AI agent implementation that achieved 70% employee adoption within two months.

The case study is particularly valuable for understanding the real-world tradeoffs between building versus buying AI infrastructure, the critical importance of change management in LLMOps success, and the technical architecture decisions required for secure, compliant AI deployment in regulated industries. It’s worth noting that this text is promotional content from Dust (the vendor), so claims about superiority and results should be viewed with appropriate skepticism, though the specific metrics and implementation details provide useful insights into enterprise LLMOps practices.

The Initial Technical Approach and Build vs. Buy Decision

Wakam’s LLMOps journey began in late 2023 when GPT-3.5 and RAG capabilities emerged. Their initial approach involved their five-person data science team building a custom AI chatbot with RAG capabilities from scratch. This represents a common enterprise pattern: organizations with technical capabilities attempt to build proprietary solutions to maintain control and customization.

The technical challenges they encountered illuminate fundamental LLMOps considerations. Building effective RAG systems required implementing vector databases, managing model orchestration, creating user interfaces, and continuously maintaining these systems as the AI landscape evolved rapidly. The velocity of change in the AI market created an impossible maintenance burden—every new feature required weeks of development time, and keeping pace with commercial platforms would have required tripling the team size.

Critically, even when the technical implementation worked well—successfully answering questions using company data and browsing the internet—adoption remained limited primarily to technical team members. This highlights a fundamental LLMOps lesson that technical success does not equal business impact without proper change management, training, and organizational support systems.

The decision to pivot to a commercial platform (Dust) reflects pragmatic LLMOps thinking: rather than replicating capabilities that specialized vendors had already solved, focus organizational resources on business impact and adoption. This tradeoff means accepting vendor dependency and potentially higher costs in exchange for faster time-to-value, continuous feature updates, and the ability to focus internal resources on use case development rather than infrastructure maintenance.

Platform Selection and Technical Architecture Requirements

Wakam’s platform evaluation framework reveals the complex technical requirements for production LLMOps in regulated industries. Their selection criteria provide a useful template for enterprise LLMOps platform evaluation, though it’s important to recognize that these were filtered through the lens of what Dust offers.

Model agnosticism was positioned as critical for avoiding vendor lock-in as the AI model landscape shifted between OpenAI, Anthropic, Mistral, and emerging providers. This represents sound LLMOps architecture—the ability to swap underlying models based on performance, cost, or compliance requirements protects against provider dependencies and enables optimization over time. However, true model agnosticism has limitations; different models have varying context windows, capabilities, and prompt engineering requirements, making seamless switching more complex than it appears.

RAG capabilities and data integration represented core technical requirements. The platform needed to securely access proprietary knowledge across insurance regulations, partner contracts, operational procedures, and market intelligence while integrating with existing data sources including Notion, Slack, Snowflake, HubSpot, and SharePoint. The requirement that business users manage these integrations without technical support is ambitious and represents a key differentiator in LLMOps platforms—the extent to which non-technical users can configure production AI systems.

Security and compliance requirements reflect the realities of LLMOps in regulated industries. Operating across 32 countries meant strict data protection regulations, requiring enterprise-grade security, audit trails, granular data access controls, SSO integration (specifically Entra ID), and clear data governance capabilities. These requirements significantly constrain platform options and architectural approaches.

Extensibility through APIs enabled custom integrations and specialized agents while leveraging platform core capabilities. This represents a critical LLMOps pattern: platforms must balance ease-of-use with power-user capabilities, allowing sophisticated workflows without forcing all users into complex interfaces.

Security Architecture and Permission Systems

One of the most technically interesting aspects of Wakam’s implementation is the dual-layer permission system for managing sensitive data access in AI agent environments. This addresses a fundamental LLMOps challenge: how to give agents access to information they need while ensuring sensitive data doesn’t leak to unauthorized users.

The architecture organizes data into “spaces”—data containers that can be company-wide accessible or restricted to specific users. Agents only retrieve information from their assigned spaces, and users can only interact with agents if they have access to all spaces those agents require. This dual-layer approach controls both agent-to-data access (which spaces can an agent query) and human-to-agent access (which users can invoke which agents).

This architecture enables different security profiles for different use cases. Compliance teams could create agents accessing sensitive regulatory documents through restricted spaces accessible only to compliance personnel. Finance teams could build agents with financial data accessible only to executives and finance members. This granular control is essential for production LLMOps in enterprises with complex security requirements.

However, the text doesn’t detail how this architecture handles several important edge cases: how agents handle requests that would require accessing data from multiple security domains, how audit trails track what information agents actually retrieved and shared, or how the system prevents indirect information leakage where agents might inadvertently reveal restricted information through their responses even when users don’t have direct access. These represent ongoing challenges in production LLMOps security.

Implementation Strategy and Change Management

The most striking aspect of Wakam’s LLMOps deployment was achieving 70% monthly active usage within two months, which they attribute primarily to comprehensive change management rather than technical sophistication. This represents a critical but often underemphasized aspect of production LLMOps: organizational adoption is typically the binding constraint, not technical capability.

Executive sponsorship positioned AI agents as strategic priority and fundamental workflow shift rather than optional experiment. Leadership communicated regularly during weekly company meetings, featuring success stories, new capabilities, and usage metrics. They positioned AI agents as the preferred method for information retrieval rather than an additional option, which reduced competition with existing workflows. This top-down mandate approach can be effective but also risks creating resentment if the tools don’t deliver value or if employees feel forced into uncomfortable workflows.

Employee empowerment to build agents represents perhaps the most significant architectural decision. Of 136 deployed agents, approximately 40 were built by the AI Engineering team while 96 were created by employees across business units. This distributed development model recognizes that centralized teams cannot understand every business challenge well enough to build optimal solutions, and empowers domain experts to create agents addressing their specific needs.

This approach required extensive enablement infrastructure: comprehensive training programs covering practical use cases, hands-on practice, and guidance on when to use AI agents versus traditional tools; advanced sessions teaching agent identification, independent building, instruction structuring, and troubleshooting; weekly open office hours for questions and challenges; dedicated Slack support channels for real-time help and knowledge sharing; hackathons pairing business experts with technical team members; documentation and curation systems including a “meta-agent” helping employees build agents; and progressive complexity support where simple agent creation is self-service while complex agents receive full collaborative development support.

This democratized development model has significant LLMOps implications. It distributes agent quality assurance across many non-expert developers, potentially creating consistency and quality challenges. It requires platform capabilities sophisticated enough for business users yet flexible enough for complex use cases. It creates governance challenges around version control, agent lifecycle management, and preventing duplicate or conflicting agents. The text doesn’t address how Wakam manages these challenges, representing an important gap in understanding the full LLMOps complexity.

Production Use Cases and Agent Architecture Evolution

The case study describes an evolution from simple “knowledge assistants” to more sophisticated “action agents,” representing a natural maturity curve in production LLMOps implementations.

Phase 1: Knowledge assistants focused on helping employees access information and improve output quality while retaining human responsibility for all actions. Examples included HR policy assistants and contract review assistants. These represent relatively low-risk LLMOps deployments—the agents don’t take actions, only provide information, limiting potential negative impacts from errors.

Phase 2: Action agents (Wakam’s current state) can take actions autonomously rather than just providing information. Two specific agents illustrate this evolution:

Harvey (Legal Agent) operates across the corporate legal team’s digital workspace with access to Notion, Outlook, web search, SharePoint, and calendar tools. Harvey can read, write, and remember context, handling complex corporate legal workflows previously requiring manual coordination across multiple systems. While human-activated, Harvey represents significant automation of knowledge work.

MoneyPenny (Personal Productivity Agent) acts on users’ behalf across Wakam’s digital workplace—Outlook, Slack, Notion, and HubSpot. MoneyPenny retrieves emails, prepares meetings, synthesizes weekly activity, writes to Notion pages, and summarizes Slack mentions. Rather than users choosing which agent for each task, MoneyPenny orchestrates multiple actions based on intent, representing a higher-level abstraction in agent interfaces.

These action agents raise important LLMOps questions the text doesn’t fully address. How does Wakam ensure action quality and prevent errors that could have business consequences? What monitoring and alerting systems detect when agents make mistakes? How are agent actions audited for compliance purposes? What rollback mechanisms exist when agents take incorrect actions? These operational concerns are critical for production LLMOps but aren’t detailed in the promotional content.

Phase 3: Autonomous agents represents Wakam’s future vision—agents operating as domain experts capable of addressing entire job functions, operating proactively in response to system events, generating scheduled analyses, monitoring business metrics, and alerting humans only for exceptions. In regulated industries like insurance, these agents would operate within predefined, validated, and auditable boundaries aligned with regulatory frameworks.

This vision raises fundamental questions about the limits of current LLM technology. While LLMs excel at pattern matching and natural language tasks, operating as “domain experts” across entire job functions requires reasoning capabilities, contextual understanding, and error handling that current models struggle with. The gap between aspirational vision and realistic near-term capabilities is important to recognize in LLMOps planning.

Technical Integration and Data Pipeline Architecture

While the text emphasizes high-level strategy over technical implementation details, several integration points reveal important LLMOps architecture considerations. The platform integrated with Notion, SharePoint, Slack, Snowflake, HubSpot, and Outlook, representing diverse data sources with different APIs, authentication mechanisms, and data models.

Implementing effective RAG across these heterogeneous sources requires solving several technical challenges: unified authentication and authorization across systems with different security models; data synchronization strategies determining how frequently to update vector embeddings from source systems; handling schema evolution as source systems change their data structures; chunking strategies for different document types (structured data in Snowflake versus unstructured documents in SharePoint); and embedding model selection and vector database management.

The text mentions that business users could manage data integrations without technical support, suggesting that the Dust platform abstracts these complexities. However, this abstraction has limits—effective RAG requires understanding document structures, metadata, and domain-specific chunking strategies that may be difficult for non-technical users to optimize.

Metrics, Monitoring, and Continuous Improvement

Wakam built internal dashboards tracking adoption metrics including user activity rates by team, most valuable agents, and productivity impact across use cases. This monitoring infrastructure is essential for production LLMOps, though the text provides limited detail about the specific metrics tracked or how they’re calculated.

The quantitative results cited include 70% employee adoption, 136 deployed agents within two months, and 50% reduction in legal contract analysis time. While impressive, these metrics should be evaluated critically. “Adoption” metrics can be misleading—do users find genuine value or are they responding to top-down mandate? The 50% time reduction in legal contract analysis is dramatic but lacks detail about how it was measured, whether it accounts for time spent reviewing agent output for errors, or whether quality remained consistent.

The text mentions Wakam’s ambition to reach 90% monthly usage and 70% weekly usage, along with every team building at least one agent. These targets suggest current adoption, while high, still has room for growth and that some teams or use cases haven’t found agent applications valuable enough for regular use.

Critical Assessment and Limitations

Several important caveats and limitations should inform interpretation of this case study. First, this is vendor-generated content from Dust promoting their platform, so claims should be viewed with appropriate skepticism. The case study emphasizes successes while likely omitting failures, challenges, and ongoing problems.

Second, the text provides limited technical detail about how core LLMOps challenges are actually solved. How is prompt engineering managed across 136 agents built by non-experts? How is agent output quality evaluated? What testing and validation processes exist before agents are deployed? How are agents versioned and updated? These operational details are critical for understanding the full LLMOps complexity but are largely absent from the promotional narrative.

Third, the cost-benefit analysis is incomplete. While the text mentions time savings, it doesn’t discuss the total cost of ownership including platform licensing fees, internal support resources, training costs, and ongoing maintenance. The build vs. buy decision involves complex tradeoffs that depend heavily on organizational context, and the case study presents only one side of this analysis.

Fourth, the long-term sustainability of the distributed agent development model remains unclear. As the number of agents grows, how does Wakam manage agent sprawl, redundancy, and maintenance? Who is responsible when agents break due to upstream data source changes? How are agents deprecated when they’re no longer needed? These lifecycle management questions are critical for long-term LLMOps success.

Finally, the security and compliance architecture, while described at a high level, likely involves complexities and edge cases not captured in the promotional content. Production LLMOps in regulated industries requires extensive controls, audit capabilities, and risk management processes that go beyond the dual-layer permission system described.

Key Takeaways for LLMOps Practitioners

Despite these limitations, Wakam’s case study offers several valuable lessons for production LLMOps. The build vs. buy analysis, while ultimately promoting the buy decision, correctly identifies the resource constraints and velocity challenges of building AI infrastructure in-house. For most enterprises, focusing resources on business impact rather than infrastructure makes strategic sense.

The emphasis on change management and organizational enablement as the primary drivers of adoption is well-founded. Technical capability without organizational adoption delivers no business value, and the comprehensive support infrastructure Wakam built represents significant but necessary investment.

The distributed agent development model—empowering domain experts to build their own agents with appropriate support—represents an interesting approach to scaling LLMOps beyond centralized teams. This model’s long-term success likely depends on platform capabilities that enable non-experts to build quality agents and governance systems that manage the complexity of many distributed developers.

The security architecture with dual-layer permissions (agent-to-data and human-to-agent) addresses real challenges in enterprise LLMOps, though the edge cases and operational details matter significantly in practice. The evolution from knowledge assistants to action agents to autonomous agents represents a realistic maturity progression, though organizations should be realistic about the capabilities and limitations of current LLM technology at each stage.

Overall, this case study provides useful insights into enterprise LLMOps implementation while requiring critical interpretation given its promotional nature and the gaps in technical detail around core operational challenges.

Enterprise-Scale AI Agent Deployment in Insurance

Industry

Technologies