Nvidia: Automated CVE Analysis and Remediation Using Event-Driven RAG and AI Agents

Summary

Nvidia presents Agent Morpheus, an internal production system designed to address the growing challenge of software vulnerability management at enterprise scale. With the CVE database hitting record highs (over 200,000 cumulative vulnerabilities reported by end of 2023), traditional approaches to scanning and patching have become unmanageable. The solution demonstrates a sophisticated LLMOps implementation that combines multiple LLMs, RAG, and AI agents in an event-driven architecture to automate the labor-intensive process of CVE analysis and exploitability determination.

The core innovation here is distinguishing between a vulnerability being present (a CVE signature detected) versus being exploitable (the vulnerability can actually be executed and abused). This nuanced analysis previously required security analysts to manually synthesize information from multiple sources—a process that could take hours or days per container. Agent Morpheus reduces this to seconds while maintaining the quality of analysis through intelligent automation and human-in-the-loop oversight.

Technical Architecture and LLM Configuration

The system employs four distinct Llama3 large language models, with three of them being LoRA (Low-Rank Adaptation) fine-tuned for specific tasks within the workflow:

Planning LLM: A LoRA fine-tuned model specifically trained to generate unique investigation checklists based on the CVE context. This model takes vulnerability and threat intelligence data and produces actionable task lists tailored to each specific CVE.
AI Agent LLM: Another LoRA fine-tuned model that executes checklist items within the context of a specific software project. This agent can autonomously retrieve information and make decisions by accessing project assets including source code, SBOMs (Software Bill of Materials), documentation, and internet search tools.
Summarization LLM: A LoRA fine-tuned model that combines all findings from the agent’s investigation into coherent summaries for human analysts.
VEX Formatting LLM: The base Llama3 model that standardizes justifications for non-exploitable CVEs into the common machine-readable VEX (Vulnerability Exploitability eXchange) format for distribution.

This multi-model architecture represents a thoughtful LLMOps design decision—rather than using a single general-purpose model for all tasks, Nvidia chose to specialize models through fine-tuning for their specific roles, likely improving accuracy and reliability for each stage of the pipeline.

Inference Infrastructure with NVIDIA NIM

The deployment leverages NVIDIA NIM inference microservices, which serves as the core inference infrastructure. A key architectural decision was hosting all four model variants (three LoRA adapters plus base model) using a single NIM container that dynamically loads LoRA adapters as needed. This approach optimizes resource utilization while maintaining the flexibility to serve different specialized models.

The choice of NIM was driven by several production requirements:

OpenAI API compatibility: NIM provides an API specification compatible with OpenAI’s interface, simplifying integration with existing tooling and agent frameworks.
Dynamic LoRA loading: The ability to serve multiple LoRA-customized models from a single container reduces infrastructure complexity and costs.
Variable workload handling: Agent Morpheus generates approximately 41 LLM queries per CVE on average. With container scans potentially generating dozens of CVEs per container, the system can produce thousands of outstanding LLM requests for a single container scan. NIM is designed to handle this bursty, variable workload pattern that would be challenging for custom LLM services.

Event-Driven Pipeline Architecture

The system is fully integrated into Nvidia’s container registry and security toolchain using the Morpheus cybersecurity framework. The workflow is triggered automatically when containers are uploaded to the registry, making it truly event-driven rather than batch-processed.

The pipeline flow operates as follows: A container upload event triggers a traditional CVE scan (using Anchore or similar tools). The scan results are passed to Agent Morpheus, which retrieves current vulnerability and threat intelligence for the detected CVEs. The planning LLM generates investigation checklists, the AI agent executes these autonomously, the summarization LLM consolidates findings, and finally results are presented to human analysts through a security dashboard.

One notable aspect of this architecture is that the AI agent operates autonomously without requiring human prompting during its analysis. The agent “talks to itself” by working through the generated checklist, retrieving necessary information, and making decisions. Human analysts are only engaged when sufficient information is available for them to make final decisions—a design that optimizes analyst time and attention.

Agent Tooling and LLM Limitations Mitigation

The case study reveals practical approaches to overcoming known LLM limitations in production. The AI agent has access to multiple tools beyond just data retrieval:

Version comparison tool: The team discovered that LLMs struggle to correctly compare software version numbers (e.g., determining that version 1.9.1 comes before 1.10). Rather than attempting to solve this through prompting or fine-tuning, they built a dedicated version comparison tool that the agent can invoke when needed.
Calculator tools: A well-known weakness of LLMs is mathematical calculations. The system provides calculator access to overcome this limitation.

This pragmatic approach—using tools to handle tasks LLMs are poor at rather than trying to force LLMs to do everything—represents mature LLMOps thinking.

Parallel Processing and Performance Optimization

Using the Morpheus framework, the team built a pipeline that orchestrates the high volume of LLM requests asynchronously and in parallel. The key insight is that both the checklist items for each CVE and the CVEs themselves are completely independent, making them ideal candidates for parallelization.

The performance results are significant: processing a container with 20 CVEs takes 2842.35 seconds when run serially, but only 304.72 seconds when parallelized using Morpheus—a 9.3x speedup. This transforms the practical utility of the system from something that might take nearly an hour per container to completing in about 5 minutes.

The pipeline is exposed as a microservice using HttpServerSourceStage from Morpheus, enabling seamless integration with the container registry and security dashboard services.

Continuous Learning and Human-in-the-Loop

The system implements a continuous improvement loop that leverages human analyst output. After Agent Morpheus generates its analysis, human analysts review the findings and may make corrections or additions. These human-approved patching exemptions and changes to the Agent Morpheus summaries are fed back into LLM fine-tuning datasets.

This creates a virtuous cycle where the models are continually retrained using analyst output, theoretically improving system accuracy over time based on real-world corrections. This approach addresses a common LLMOps challenge: how to maintain and improve model performance in production when ground truth labels are expensive to obtain.

Production Integration and Workflow

The complete production workflow demonstrates enterprise-grade integration:

Container upload triggers automatic CVE scanning
Scan results flow automatically to Agent Morpheus
Agent Morpheus retrieves intelligence and runs its analysis pipeline
Results are surfaced to a security analyst dashboard
Analysts review and make final recommendations
Recommendations undergo peer review
Final VEX documents are published and distributed with containers
Analyst corrections feed back into training datasets

This end-to-end automation, from container upload to VEX document publication, represents a mature production deployment rather than a proof-of-concept.

Critical Assessment

While the case study presents impressive results, it’s worth noting several caveats:

The 9.3x speedup comparison is between their own serial and parallel implementations, not against any baseline or competitive approach.
The “hours or days to seconds” claim for triage time improvement lacks specific baseline measurements or methodology.
This is effectively a first-party case study from Nvidia promoting their own NIM and Morpheus products, so claims should be evaluated with appropriate skepticism.
The system still requires human analyst review, so “fully automated” should be understood as “automated analysis with human oversight” rather than completely autonomous operation.

Nevertheless, the technical architecture demonstrates sophisticated LLMOps practices including multi-model orchestration, LoRA fine-tuning for task specialization, tool augmentation for LLM limitations, parallel inference optimization, event-driven microservices architecture, and continuous learning from human feedback—all running in a production environment at enterprise scale.

Automated CVE Analysis and Remediation Using Event-Driven RAG and AI Agents

Industry

Technologies

Summary

Technical Architecture and LLM Configuration

Inference Infrastructure with NVIDIA NIM

Event-Driven Pipeline Architecture

Agent Tooling and LLM Limitations Mitigation

Parallel Processing and Performance Optimization

Continuous Learning and Human-in-the-Loop

Production Integration and Workflow

Critical Assessment

More Like This

Enterprise AI Platform Integration for Secure Production Deployment

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Advanced Fine-Tuning Techniques for Multi-Agent Orchestration at Scale