## Overview
This case study describes Databricks' internal use of their own platform to build a GenAI-powered sales assistant called the "Field AI Assistant." The project represents a "Databricks on Databricks" approach where the company uses its own technology stack to solve an internal business challenge. The core problem addressed is the overwhelming volume of information that sales teams must navigate across multiple siloed applications during their daily work. Sellers need access to data from CRM systems, account intelligence, sales enablement content, customer insights, and various collaboration platforms—all of which traditionally required manual effort to retrieve and synthesize.
The Field AI Assistant aims to augment the seller experience by providing a conversational AI interface that can retrieve information, distill insights, and perform automated actions. This is positioned as a digital transformation initiative with the broader automation vision of making business processes "better, faster, and cheaper." It's worth noting that as this is Databricks describing their own internal implementation using their own products, some claims should be viewed with appropriate consideration that it serves as both a genuine internal tool and a showcase for their platform capabilities.
## Technical Architecture and LLMOps Components
The solution is built entirely on the Databricks technology stack using what they call a "compound AI agentic framework." This architecture consists of several key components that work together to process natural language queries and deliver actionable responses.
### Data Foundation
The system integrates with multiple data sources to provide comprehensive information access. These include the internal Databricks Lakehouse containing account intelligence, sales enablement content, and sales playbooks; the CRM platform (Salesforce) for opportunity and account data; and collaboration platforms that collate and index unstructured data. This multi-source integration is fundamental to the solution's ability to provide a 360-degree customer view and represents a common pattern in enterprise RAG implementations where data fragmentation is a significant challenge.
### Agent and Tool Framework
The Field AI Assistant employs a single driver agent architecture with multiple tools and functions for deterministic processing. This design pattern acknowledges the inherent ambiguity in human inputs and uses the LLM's contextual understanding to interpret intent and route requests to appropriate processing pipelines. The tools can include Python functions, SQL queries, or API integrations with external applications such as Glean, Perplexity, and Aha.
The Mosaic AI Function Calling capability enables the LLM to extract fields from queries and pass them to standard function calls. This is a critical LLMOps pattern that combines the flexibility of natural language understanding with the reliability of deterministic code execution. By separating concerns between intent recognition (handled by the LLM) and data retrieval/processing (handled by structured functions), the system can deliver more accurate and consistent results.
### Model Selection and Management
The solution uses Azure OpenAI GPT-4 as the foundational model, chosen after evaluation against various open-source alternatives. The selection criteria included groundedness (factual accuracy), ability to generate relevant content, correct tool selection for processing prompts, and adherence to output formatting requirements. Importantly, the architecture is designed for model flexibility, allowing adoption of new models as they become available in the Mosaic AI agent framework.
This multi-model evaluation approach represents a mature LLMOps practice where model selection is based on empirical performance rather than assumption. The acknowledgment that different models may be better suited for specific use cases suggests an understanding that production GenAI systems often benefit from model specialization.
### Vector Search and Embeddings
The solution leverages Mosaic AI Vector Search for handling unstructured data retrieval. While the case study doesn't provide extensive detail on the embedding strategy, the presence of vector databases is essential for the RAG-based retrieval of sales collateral, playbooks, competitive materials, and other document-based content. This enables the semantic search capabilities that allow sellers to query information using natural language rather than exact keyword matches.
### Governance and Security
Unity Catalog provides the governance layer for discoverability, access control, and cataloging of the underlying datasets, agents, and tools. The Mosaic AI Gateway adds additional operational controls including rate limiting, payload logging, access controls, and guardrails for filtering system inputs and outputs. This governance infrastructure enables continuous monitoring for safety, bias, and quality—critical requirements for any enterprise LLM deployment.
The case study emphasizes that engaging early with Enterprise Security, Privacy, and Legal teams is essential, and that building a strong governance model is a "MUST." This reflects the real-world challenges of deploying GenAI in enterprise environments where data sensitivity and compliance requirements are paramount.
## Evaluation and Quality Assurance
The solution leverages the Mosaic AI Agent Framework's built-in evaluation capabilities. Defined evaluation criteria are used with an "LLM-as-a-judge" approach to score application responses. This automated evaluation methodology is increasingly common in LLMOps for scaling quality assessment beyond what human review alone can accomplish.
However, the case study acknowledges that measuring ROI is difficult and that building evaluation datasets for measuring model effectiveness "is hard and requires focused effort and a strategy that supports rapid experimentation." This honest assessment highlights a common challenge in LLMOps where establishing meaningful success metrics and baselines requires significant investment.
## Capabilities and Use Cases
The Field AI Assistant offers several functional categories. For customer insights, it provides a 360-degree account view including financial news, competitive landscape data, product consumption by product line and cloud, customer support cases, and top revenue-driving use cases. It also offers seller recommendations on use cases offered to similar customers.
Data hygiene alerts surface information about use cases going live in upcoming periods, top use case blockers, and use cases lacking key information such as executive business sponsors. This proactive alerting capability represents an interesting application of GenAI for data quality management.
Sales collateral features include access to playbooks, competitive materials, meeting summarization, and pitch decks. The action orchestration capabilities allow the system to update CRM fields, draft prospecting emails, and create customer-facing proposals. These action-oriented features move beyond passive information retrieval into active workflow automation.
## Key Learnings and Challenges
The case study surfaces several practical learnings from the implementation. The observation that "data is messy" led to an iterative approach focused on data-engineered pipelines and building clean "GOLD Single Source of Truth" datasets. This reinforces that even with sophisticated GenAI capabilities, data quality remains foundational to system effectiveness.
The difficulty in measuring ROI prompted a strategy of experimenting with small focus groups in pilot phases. This incremental approach to deployment and validation is a prudent LLMOps practice that manages risk while building evidence of value.
## Critical Assessment
While the case study provides useful architectural insights, several limitations should be noted. As an internal implementation by Databricks using Databricks technology, it inherently serves as a product showcase. Quantitative results are notably absent—there are no specific metrics on time savings, accuracy rates, user adoption, or business impact. The claim that the solution "empowers sellers to focus on strategic, high-value activities" is aspirational rather than evidenced.
Additionally, the case study mentions customization capabilities including fine-tuning (via DSPy on Databricks and Mosaic AI Fine-tuning) but doesn't clarify whether fine-tuning was actually performed for this implementation or if the GPT-4 model was used with prompt engineering alone. The MLFlow LLMOps mention suggests there may be experiment tracking and model management practices in place, but details are sparse.
The solution represents a fairly standard enterprise RAG + agents pattern, combining document retrieval, structured data access, and LLM orchestration. While the implementation appears well-architected within the Databricks ecosystem, organizations considering similar approaches should evaluate whether the same patterns could be achieved with alternative technology stacks and at what relative cost and complexity.