IBM, The Zig, Augmented AI Labs: Enterprise AI Agent Development: Lessons from Production Deployments

LLMOps Database

Consulting

IBM, The Zig, Augmented AI Labs

Company

IBM, The Zig, Augmented AI Labs

Title

Enterprise AI Agent Development: Lessons from Production Deployments

Industry

Consulting

Link

https://www.youtube.com/watch?v=tD1aVoQhbwQ

Year

2025

Summary (short)

This panel discussion features three companies - IBM, The Zig, and Augmented AI Labs - sharing their experiences building and deploying AI agents in enterprise environments. The panelists discuss the challenges of scaling AI agents, including cost management, accuracy requirements, human-in-the-loop implementations, and the gap between prototype demonstrations and production realities. They emphasize the importance of conservative approaches, proper evaluation frameworks, and the need for human oversight in high-stakes environments, while exploring emerging standards like agent communication protocols and the evolving landscape of enterprise AI adoption.

This case study represents a comprehensive panel discussion featuring three companies with distinct approaches to enterprise AI agent development and deployment. The companies include IBM Research (represented by Sandy Besson, an applied AI research engineer working on BAI open-source framework and Agent Communication Protocol), The Zig (led by CEO Chris Chileles, an AI and data engineering consulting company), and Augmented AI Labs (headed by CEO Boaz Ashkanazi, focusing on high-stakes AI solutions for legal and government sectors). The discussion reveals several critical insights about the challenges and realities of deploying AI agents in enterprise environments. One of the most significant challenges identified is the disconnect between prototype demonstrations and production requirements. Sandy Besson from IBM notes that enterprises often want to see impressive, "fancy" demonstrations during the sales process, but when it comes to actual production deployment, especially for customer-facing applications, they dramatically reduce their risk tolerance. This creates a fundamental tension where the compelling demo that wins the deal differs substantially from what gets implemented in production. Cost management emerges as a particularly surprising challenge that becomes apparent only at scale. The panelists emphasize that while systems may work well and appear secure during small-scale testing, the financial implications become shocking when scaled up. Chris Chileles from The Zig specifically mentions the tendency to "throw the kitchen sink" at problems without being specific about what they're solving, leading to unexpected billing surprises. This highlights the importance of cost-conscious architecture decisions from the beginning of the development process. The evaluation and success metrics for AI agents present another complex challenge. The panelists introduce the concept of the Pareto frontier as a framework for helping clients understand trade-offs between competing KPIs such as accuracy, cost, and speed. Chris Chileles explains that they educate customers about these trade-offs, helping them understand that unlimited resources and time are rarely available, so decisions must be made about which metrics to prioritize. This quantitative approach helps ground conversations in measurable outcomes rather than subjective assessments. Human-in-the-loop implementation represents a recurring theme across all three companies, particularly for high-stakes environments. Boaz Ashkanazi from Augmented AI Labs explains that given their work with financial services and law firms, they almost always incorporate human oversight elements, with dashboards that specifically highlight where uncertainties occur and where human intervention is needed. The panelists share a useful heuristic: treat AI agents like interns - determine what level of autonomy you would give to a human intern in the same role, and apply similar oversight principles to the AI system. A particularly illustrative example comes from Augmented AI Labs' work with Guidant Financial, a fintech company dealing with compliance-heavy document processing. The company was previously requiring customers or employees to manually fill out forms to convert unstructured data into structured formats. The AI solution deployed agents to read various payroll documents, which often contained messy, inconsistent formatting that could confuse large language models. The solution involved establishing document-by-document criteria and allowing the system to flag areas of confusion, presenting these to users through a conventional dashboard interface. This approach successfully improved accuracy from around 70% to the high 90s while maintaining necessary human oversight for edge cases. The discussion also covers emerging technical standards in the agent ecosystem, particularly around communication protocols. Sandy Besson explains the distinction between Model Context Protocol (MCP), which provides tools and resources to individual language models, and Agent Communication Protocol (ACP), which enables horizontal communication between different agents, microservices, or subprocesses regardless of their underlying framework or technology. This represents the evolution from siloed agent development to collaborative multi-agent systems. Data governance and security considerations feature prominently in the enterprise context. The panelists note an interesting shift where, after years of encouraging cloud migration, AI requirements are driving some enterprises back toward on-premises solutions to maintain data control. Chris Chileles observes that companies are increasingly concerned about data ownership and are seeking assurances that their proprietary information remains secure. This has led to careful contract review and consideration of self-hosted solutions, with companies like Microsoft emphasizing "your data is yours" messaging. The approach to data handling varies by risk tolerance and regulatory requirements. Boaz Ashkanazi describes restricting model inputs and outputs even when models have access to complete databases, particularly when dealing with sensitive information like social security numbers. The panelists advocate for applying traditional security principles - least access, encryption, anonymization - treating AI systems like human employees in terms of access controls. An interesting technical insight emerges around the use of synthetic data in enterprise contexts. Chris Chileles presents the argument that enterprises may need less real data than foundation model providers because they're modeling specific use cases rather than attempting to model the entire world. This creates opportunities for generating synthetic data to represent known edge cases, potentially reducing dependence on large volumes of real data while maintaining model performance for specific enterprise applications. The discussion reveals a significant gap between conference discussions and real-world enterprise adoption rates. The panelists emphasize that while AI conferences create bubbles of rapid innovation, actual enterprise adoption remains slow due to regulatory constraints, risk aversion, and change management challenges. Sandy Besson notes that highly regulated industries face substantial risks from rapid AI adoption, including potential regulatory penalties from SEC, HIPAA, or other oversight bodies. A compelling anecdote illustrates organizational challenges beyond technology: lawyers at an Omaha law firm were using ChatGPT to complete tasks more efficiently but were reluctant to fully embrace the productivity gains because they needed to meet billable hour quotas. This highlights how existing business models and incentive structures can impede AI adoption even when the technology provides clear benefits. The rapid pace of AI development creates strategic challenges for enterprises and vendors alike. Boaz Ashkanazi describes scenarios where teams build complex systems to improve model accuracy from the mid-70s to high-90s, only to have new model releases achieve similar improvements out-of-the-box. This creates tension between immediate business needs and the potential for future technological improvements. The panelists recommend building modular systems that can adapt to changing capabilities and avoiding vendor lock-in to minimize technical debt. The discussion concludes with observations about the shifting landscape of AI tooling. As reasoning capabilities improve in foundation models, there's less need for extensive custom tooling and frameworks. This "shifting left" toward model providers means that many intermediate solutions may become temporary, emphasizing the importance of building systems that can improve automatically as underlying models advance. The panelists also address concerns about AI disrupting the Software-as-a-Service (SaaS) business model. Their consensus is that while SaaS solutions will evolve and change, the fundamental need for hosted services will persist because not all organizations can build and host everything themselves. However, pricing models are evolving, particularly moving away from per-user models toward usage-based pricing that better reflects AI-driven cost structures. Throughout the discussion, the panelists emphasize practical, conservative approaches to enterprise AI deployment. They advocate for treating AI systems with appropriate skepticism, implementing proper guardrails, maintaining human oversight for critical decisions, and building systems that can evolve with rapidly advancing technology. Their experiences highlight the importance of managing expectations, understanding true costs, and building robust evaluation frameworks that can demonstrate clear business value while maintaining acceptable risk levels.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source