zeb developed SuperInsight, a generative AI-powered self-service reporting engine that transforms natural language data requests into actionable insights. Using Databricks' DBRX model and combining fine-tuning with RAG approaches, they created a system that reduced data analyst workload by 80-90% while increasing report generation requests by 72%. The solution integrates with existing communication platforms and can generate reports, forecasts, and ML models based on user queries.
zeb is a consulting firm specializing in digital transformation with over 15 years of experience and more than 1,500 experts. They developed SuperInsight, a self-service reporting engine that is part of their SuperDesk suite of AI-powered service desk products. The core problem SuperInsight addresses is the significant workload burden placed on data analysts who manually process numerous data requests from business users. The solution enables non-technical users to interact with their data through natural language queries submitted via familiar communication platforms like Slack, Microsoft Teams, or email, receiving actionable insights in return.
The genesis of SuperInsight came from a deployment for a large enterprise logistics company where data plays a critical role. Prior to the AI implementation, a team of several data analysts managed a substantial backlog of data requests. The initial GenAI-based system reportedly reduced the workload on data analysts by 80–90%, which is an impressive claim though it should be noted this figure comes from zeb themselves. The success of this pilot led to the development of a more generalized product that could serve multiple industries including logistics and supply chain, retail, fintech, and healthcare/life sciences.
The SuperInsight system is built entirely on the Databricks Data Intelligence Platform, representing an end-to-end LLMOps solution that consolidates what would otherwise require “30 different solutions to piece together,” according to the case study. The architecture employs a compound AI system approach, combining multiple AI techniques to achieve robust, production-ready performance.
The solution uses both Retrieval-Augmented Generation (RAG) and fine-tuning, applying them for distinct purposes within the pipeline. According to Sid Vivek, Head of AI at zeb, fine-tuning is used to “change the behavior and scope of the model itself and put it within the context understanding of a specific industry.” Meanwhile, RAG is applied with the assumption that the model is already industry-trained, but needs to understand “a particular organization’s data schema and context.” This dual approach represents a sophisticated understanding of when to apply each technique in production LLM systems.
zeb selected the open-source DBRX model from Databricks for both fine-tuning and RAG components. The decision was driven by DBRX’s ability to handle instruction-based fine-tuning with reduced latency, as well as its Mixture-of-Experts (MoE) architecture which allows for faster inference while maintaining accuracy. The MoE architecture is particularly relevant for production deployments where latency matters, as it enables the model to activate only relevant “expert” sub-networks for each query rather than the full parameter count.
The production pipeline follows this flow:
This multi-stage pipeline demonstrates several key LLMOps patterns including intent classification, retrieval augmentation, model composition with adapters, and flexible output routing.
The solution leverages several Databricks components for production operations. Model Serving endpoints handle inference requests, while Unity Catalog provides federated security and governance for both data and models. The Mosaic AI Agent Framework is used for RAG implementation, and Mosaic AI Training handles fine-tuning workflows. This consolidated approach within a single platform addresses common LLMOps challenges around security, data governance, and operational complexity.
The use of Unity Catalog for securing and federating both data and model assets is noteworthy from an enterprise governance perspective. Organizations deploying LLM systems often struggle with ensuring that the AI system respects existing data access controls and audit requirements. By building on Unity Catalog, SuperInsight inherits these governance capabilities.
A key architectural decision was the development of canonical data models aligned with four different industries (logistics/supply chain, retail, fintech, and health/life sciences). This approach allows zeb to deploy SuperInsight across different customer contexts while maintaining industry-specific understanding. The fine-tuned adapters enable industry customization without requiring complete model retraining for each deployment.
The solution’s integration with existing workflow tools (Slack, Teams, ServiceNow, Jira) is critical for production adoption. By meeting users where they already work rather than requiring them to adopt new interfaces, the system reduces friction and increases adoption. This is a common pattern in successful enterprise AI deployments.
The case study mentions that zeb is “constantly trying to improve as we do more implementations,” gathering data understanding and knowledge that feeds back into their canonical data models. This suggests an ongoing MLOps/LLMOps process where production data and feedback inform model improvements over time.
The claimed results include:
The 72% increase in report requests is particularly interesting from an LLMOps perspective, as it demonstrates the common phenomenon where making a capability easier to access leads to increased demand. This has implications for capacity planning and cost management in production LLM systems.
While the case study presents impressive results, several points merit consideration. The quantitative claims (40% development time reduction, 40% cost savings, 72% increase in requests, 80-90% workload reduction) come from zeb and Databricks, who have commercial interests in presenting favorable outcomes. Independent verification of these metrics is not provided.
The development timeline of “just a few months” for the initial version is plausible given the use of a unified platform and pre-built components, though the scope and complexity of that initial version is not detailed.
The case study also demonstrates the vendor lock-in tradeoff common in LLMOps: while using a unified platform like Databricks simplifies development and operations, it creates dependency on that ecosystem. The comparison to needing “30 different solutions” without Databricks may be somewhat exaggerated for rhetorical effect.
Overall, this case study illustrates a well-architected compound AI system for enterprise self-service analytics, with thoughtful application of both RAG and fine-tuning techniques, strong governance integration, and attention to user experience through familiar communication channels. The architecture patterns demonstrated here are broadly applicable to similar enterprise LLM deployments.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Stripe, processing approximately 1.3% of global GDP, has evolved from traditional ML-based fraud detection to deploying transformer-based foundation models for payments that process every transaction in under 100ms. The company built a domain-specific foundation model treating charges as tokens and behavior sequences as context windows, ingesting tens of billions of transactions to power fraud detection, improving card-testing detection from 59% to 97% accuracy for large merchants. Stripe also launched the Agentic Commerce Protocol (ACP) jointly with OpenAI to standardize how agents discover and purchase from merchant catalogs, complemented by internal AI adoption reaching 8,500 employees daily using LLM tools, with 65-70% of engineers using AI coding assistants and achieving significant productivity gains like reducing payment method integrations from 2 months to 2 weeks.
This lecture transcript from Yangqing Jia, VP at NVIDIA and founder of Lepton AI (acquired by NVIDIA), explores the evolution of AI system design from an engineer's perspective. The talk covers the progression from research frameworks (Caffe, TensorFlow, PyTorch) to production AI infrastructure, examining how LLM applications are built and deployed at scale. Jia discusses the emergence of "neocloud" infrastructure designed specifically for AI workloads, the challenges of GPU cluster management, and practical considerations for building consumer and enterprise LLM applications. Key insights include the trade-offs between open-source and closed-source models, the importance of RAG and agentic AI patterns, infrastructure design differences between conventional cloud and AI-specific platforms, and the practical challenges of operating LLMs in production, including supply chain management for GPUs and cost optimization strategies.