ZenML

Automated CVE Analysis and Remediation Using Event-Driven RAG and AI Agents

Nvidia 2024
View original source

NVIDIA developed Agent Morpheus, an AI-powered system that automates the analysis of software vulnerabilities (CVEs) at enterprise scale. The system combines retrieval-augmented generation (RAG) with multiple specialized LLMs and AI agents in an event-driven workflow to analyze CVE exploitability, generate remediation plans, and produce standardized security documentation. The solution reduced CVE analysis time from hours/days to seconds and achieved a 9.3x speedup through parallel processing.

Industry

Tech

Technologies

Summary

Nvidia presents Agent Morpheus, an internal production system designed to address the growing challenge of software vulnerability management at enterprise scale. With the CVE database hitting record highs (over 200,000 cumulative vulnerabilities reported by end of 2023), traditional approaches to scanning and patching have become unmanageable. The solution demonstrates a sophisticated LLMOps implementation that combines multiple LLMs, RAG, and AI agents in an event-driven architecture to automate the labor-intensive process of CVE analysis and exploitability determination.

The core innovation here is distinguishing between a vulnerability being present (a CVE signature detected) versus being exploitable (the vulnerability can actually be executed and abused). This nuanced analysis previously required security analysts to manually synthesize information from multiple sources—a process that could take hours or days per container. Agent Morpheus reduces this to seconds while maintaining the quality of analysis through intelligent automation and human-in-the-loop oversight.

Technical Architecture and LLM Configuration

The system employs four distinct Llama3 large language models, with three of them being LoRA (Low-Rank Adaptation) fine-tuned for specific tasks within the workflow:

This multi-model architecture represents a thoughtful LLMOps design decision—rather than using a single general-purpose model for all tasks, Nvidia chose to specialize models through fine-tuning for their specific roles, likely improving accuracy and reliability for each stage of the pipeline.

Inference Infrastructure with NVIDIA NIM

The deployment leverages NVIDIA NIM inference microservices, which serves as the core inference infrastructure. A key architectural decision was hosting all four model variants (three LoRA adapters plus base model) using a single NIM container that dynamically loads LoRA adapters as needed. This approach optimizes resource utilization while maintaining the flexibility to serve different specialized models.

The choice of NIM was driven by several production requirements:

Event-Driven Pipeline Architecture

The system is fully integrated into Nvidia’s container registry and security toolchain using the Morpheus cybersecurity framework. The workflow is triggered automatically when containers are uploaded to the registry, making it truly event-driven rather than batch-processed.

The pipeline flow operates as follows: A container upload event triggers a traditional CVE scan (using Anchore or similar tools). The scan results are passed to Agent Morpheus, which retrieves current vulnerability and threat intelligence for the detected CVEs. The planning LLM generates investigation checklists, the AI agent executes these autonomously, the summarization LLM consolidates findings, and finally results are presented to human analysts through a security dashboard.

One notable aspect of this architecture is that the AI agent operates autonomously without requiring human prompting during its analysis. The agent “talks to itself” by working through the generated checklist, retrieving necessary information, and making decisions. Human analysts are only engaged when sufficient information is available for them to make final decisions—a design that optimizes analyst time and attention.

Agent Tooling and LLM Limitations Mitigation

The case study reveals practical approaches to overcoming known LLM limitations in production. The AI agent has access to multiple tools beyond just data retrieval:

This pragmatic approach—using tools to handle tasks LLMs are poor at rather than trying to force LLMs to do everything—represents mature LLMOps thinking.

Parallel Processing and Performance Optimization

Using the Morpheus framework, the team built a pipeline that orchestrates the high volume of LLM requests asynchronously and in parallel. The key insight is that both the checklist items for each CVE and the CVEs themselves are completely independent, making them ideal candidates for parallelization.

The performance results are significant: processing a container with 20 CVEs takes 2842.35 seconds when run serially, but only 304.72 seconds when parallelized using Morpheus—a 9.3x speedup. This transforms the practical utility of the system from something that might take nearly an hour per container to completing in about 5 minutes.

The pipeline is exposed as a microservice using HttpServerSourceStage from Morpheus, enabling seamless integration with the container registry and security dashboard services.

Continuous Learning and Human-in-the-Loop

The system implements a continuous improvement loop that leverages human analyst output. After Agent Morpheus generates its analysis, human analysts review the findings and may make corrections or additions. These human-approved patching exemptions and changes to the Agent Morpheus summaries are fed back into LLM fine-tuning datasets.

This creates a virtuous cycle where the models are continually retrained using analyst output, theoretically improving system accuracy over time based on real-world corrections. This approach addresses a common LLMOps challenge: how to maintain and improve model performance in production when ground truth labels are expensive to obtain.

Production Integration and Workflow

The complete production workflow demonstrates enterprise-grade integration:

This end-to-end automation, from container upload to VEX document publication, represents a mature production deployment rather than a proof-of-concept.

Critical Assessment

While the case study presents impressive results, it’s worth noting several caveats:

Nevertheless, the technical architecture demonstrates sophisticated LLMOps practices including multi-model orchestration, LoRA fine-tuning for task specialization, tool augmentation for LLM limitations, parallel inference optimization, event-driven microservices architecture, and continuous learning from human feedback—all running in a production environment at enterprise scale.

More Like This

Enterprise AI Platform Integration for Secure Production Deployment

Rubrik 2025

Predibase, a fine-tuning and model serving platform, announced its acquisition by Rubrik, a data security and governance company, with the goal of combining Predibase's generative AI capabilities with Rubrik's secure data infrastructure. The integration aims to address the critical challenge that over 50% of AI pilots never reach production due to issues with security, model quality, latency, and cost. By combining Predibase's post-training and inference capabilities with Rubrik's data security posture management, the merged platform seeks to provide an end-to-end solution that enables enterprises to deploy generative AI applications securely and efficiently at scale.

customer_support content_moderation chatbot +53

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Advanced Fine-Tuning Techniques for Multi-Agent Orchestration at Scale

Amazon 2026

Amazon teams faced challenges in deploying high-stakes LLM applications across healthcare, engineering, and e-commerce domains where basic prompt engineering and RAG approaches proved insufficient. Through systematic application of advanced fine-tuning techniques including Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and cutting-edge reasoning optimizations like Group-based Reinforcement Learning from Policy Optimization (GRPO) and Direct Advantage Policy Optimization (DAPO), three Amazon business units achieved production-grade results: Amazon Pharmacy reduced dangerous medication errors by 33%, Amazon Global Engineering Services achieved 80% human effort reduction in inspection reviews, and Amazon A+ Content improved quality assessment accuracy from 77% to 96%. These outcomes demonstrate that approximately one in four high-stakes enterprise applications require advanced fine-tuning beyond standard techniques to achieve necessary performance levels in production environments.

healthcare customer_support content_moderation +43