## Overview
This case study comes from a presentation by Yav from NICE Actimize's data science and AI team, discussing how they embedded generative AI into their Excite platform for financial crime detection. The presentation provides an interesting perspective on applying LLMs in a domain traditionally dominated by tabular data rather than text, which presents unique challenges for generative AI applications.
NICE Actimize operates in the financial crime detection space, covering fraud detection, money laundering prevention, and financial market abuse detection (such as market manipulation). Their work is characterized by several challenging constraints: high transaction volumes, extremely rare events (highly imbalanced datasets), the need for real-time detection, and mission-critical reliability requirements. The speaker candidly acknowledges that this domain presents particular challenges for LLMs, including accuracy and reliability concerns due to hallucinations, and the fact that generative AI typically excels at textual or visual data rather than structured tabular data.
## The Problem Space
The Excite platform is described as a cloud-native system for financial crime detection and prevention that provides analytics agility and self-service capabilities. A key feature is that the platform's libraries enable analysts to deliver analytical artifacts straight to production without requiring R&D as a middleman. However, despite this agility, creating analytical objects remains a non-trivial task that requires:
- Knowledge of all relevant definitions and dimensions
- Familiarity with the data model and schemas
- Ability to write SPEL (Spring Expression Language) expressions for filtering
- Understanding of complex data mapping
This complexity creates a barrier between the intent of the analyst and the actual implementation, slowing down the creation of fraud detection capabilities.
## Solution Architecture: Text-to-Analytics
The primary use case presented is a "text-to-analytics" system that allows analysts to create analytical artifacts using natural language. The implementation uses a GPT-like agent architecture with extensive prompt engineering to understand the system's analytical artifacts, data model, schemas, and services.
### Key Technical Components
The solution incorporates several important LLMOps practices:
**Prompt Engineering and Context**: The system uses extensive pre-prompting to give the LLM context about the proprietary Excite system, including understanding of analytical artifacts, data models, and schemas. This represents a significant investment in prompt engineering to adapt a general-purpose LLM to a domain-specific task.
**Constraints and Guardrails**: The team explicitly implemented constraints and guardrails around the generative AI system. This is a critical LLMOps practice, especially in mission-critical financial applications where incorrect outputs could have serious consequences.
**Structured Output Generation**: The system generates structured JSON configurations rather than free-form text. This is an important design choice that makes the outputs verifiable and compatible with existing validation pipelines.
**Validation Pipeline Integration**: Recognizing that LLM outputs are "not 100% proof," the team designed the system so that generated artifacts go through testing pipelines and processes before being published. The speaker emphasizes this is a "safe zone" to implement generative AI because if the configurations are wrong, they simply don't work rather than causing runtime errors in production.
### Demonstrated Workflow
The presentation includes a demo showing the progressive refinement of an analytical query:
- Initial request: "Create an aggregation giving the sum of transaction amount by account using local time"
- Refinement: "Filter only transactions with amount above 10K"
- Further refinement: "Group by transaction type"
The system correctly interprets these natural language requests and generates the appropriate configurations, including proper SPEL expressions for filtering. This demonstrates the model's ability to understand both the business intent and the technical implementation requirements.
## Agentic Architecture for Model Factory
The vision extends beyond simple text-to-analytics to a full "AutoML High-Tech Factory" using multiple agents:
- **Feature Creation Agents**: Agents that constantly create features and models
- **Simulation and Backtesting**: Agents can use the platform's simulator with backtesting to verify the predictive power of created artifacts
- **User Suggestions**: The system can suggest artifacts to users only after validating their benefit through backtesting
This agentic approach represents a more autonomous system where AI agents are not just responding to user queries but proactively exploring the feature space and identifying potentially valuable additions.
## MLOps Integration
A second use case briefly mentioned involves using generative AI for MLOps support:
**Explanation Co-pilot**: An AI co-pilot that helps operations teams understand and explain ML features. This addresses the practical challenge that operations personnel may not have deep ML expertise but need to understand and manage production models.
**Autonomous Diagnostics**: More ambitiously, the team discusses agents that can automatically discover, identify, diagnose, and fix issues in production ML models. This represents a significant step toward autonomous MLOps, though the speaker doesn't go into detail about the implementation.
## Practical Considerations and Limitations
The speaker is refreshingly honest about the current limitations:
**Reliability**: LLM outputs are acknowledged as not 100% reliable, which is why the validation pipeline is essential. This is a mature approach to LLM integration that doesn't over-promise.
**Cost Concerns**: The presentation emphasizes cost as a major consideration three times ("cost cost and cost"), noting that foundation models cost trillions to train, fine-tuning is costly, and even API calls can be "overkill in cost to value." This suggests the team is carefully evaluating the ROI of LLM integration.
**Domain Mismatch**: The challenge of applying text-oriented LLMs to tabular data is acknowledged upfront. The creative insight was to apply LLMs not to the data itself but to the meta-task of creating analytical configurations.
**Co-piloting Over Automation**: The speaker notes that "co-piloting is still like here" as the main approach, suggesting that fully autonomous AI decision-making is not yet the goal. This represents a pragmatic approach to LLM integration where humans remain in the loop for validation and final decisions.
## Key Takeaways for LLMOps
Several important lessons emerge from this case study:
**Think Beyond Traditional NLP**: The speaker encourages thinking about how LLMs can simplify complex tasks beyond just text processing. The insight that LLMs can generate structured configurations and code even when the underlying domain is tabular data is valuable.
**Embed in Existing Systems**: Rather than creating standalone AI products, the approach was to embed generative AI into existing solutions to break UI limitations and enhance capabilities. This integration approach may be more practical and lower-risk than greenfield AI applications.
**Safety Through Structure**: By having LLMs generate structured outputs that pass through existing validation pipelines, the system maintains safety properties even when LLM outputs are imperfect. This is a key pattern for mission-critical applications.
**Pragmatic Expectations**: The team doesn't claim revolutionary results but focuses on efficiency gains, reduced time-to-market, and enabling non-expert users to create complex analytics. These are realistic benefits that can be achieved even with current LLM limitations.
## Impact Assessment
The claimed benefits include:
- Quick analytical artifact creation
- Frictionless self-service system for creating analytics straight to production
- Foundation for an automated ML model factory
While these claims are reasonable given the demonstrated capabilities, it's worth noting that this appears to be a relatively early-stage implementation (the speaker mentions a prototype). The full production impact would depend on factors like adoption rates among analysts, the quality of generated artifacts compared to human-created ones, and the actual cost savings realized.
The presentation represents a thoughtful approach to integrating LLMs in a challenging domain, with appropriate acknowledgment of limitations and focus on practical value rather than overhyped claims.