Doppel: AI Agent Automation of Security Operations Center Analysis

Overview

Doppel is a cybersecurity company that provides an AI-powered social engineering defense platform. Their services include brand protection, executive protection, phishing simulation, and connecting customer-detected scams to takedown actions. The company recently announced a $70M Series C funding round, indicating significant traction in the market. This case study, published in January 2025, describes how Doppel deployed an AI agent using OpenAI’s o1 model to automate portions of their Security Operations Center (SOC) workload, claiming a 30% reduction in manual work within 30 days of deployment.

The case study is presented from the perspective of a company promoting its own technology and success, so the claims should be viewed with appropriate skepticism. However, the technical approaches described offer useful insights into production LLM deployment for security operations.

The Problem: Scale and Complexity in Security Operations

Doppel’s platform ingests more than 10 million data points daily, including websites, social media accounts, and mobile applications, to identify phishing attacks worldwide. While traditional machine learning models can effectively filter out obvious false positives at this scale, the company identified that nuanced decision-making remained a significant bottleneck requiring human analyst intervention.

The decisions that required human judgment were complex and multi-faceted. Analysts had to determine whether a detected site warranted a takedown, identify which platforms should be targeted, and determine under which policies the takedown should be requested. These decisions required interpreting unstructured data such as screenshots, time-series activity patterns, and customer-specific policies. Critically, analysts also needed to explain their rationale for each decision, adding another layer of complexity.

The stakes were high: incorrect decisions could either miss genuine threats or disrupt legitimate activity. Even for trained human analysts, achieving consistent accuracy required extensive knowledge and ongoing training.

The Solution: Training an AI Agent as a Cybersecurity Expert

Doppel’s approach to building their AI agent involved several key LLMOps practices that are worth examining in detail.

Knowledge Transfer from Human Training Materials

The initial phase of development involved what Doppel describes as “knowledge transfer.” They took the same training methods and materials used to train human analysts—covering topics like phishing, malware, and brand abuse—and applied them directly to training the AI model. This approach mirrors the concept of prompt engineering and fine-tuning, where domain expertise is encoded into the model’s behavior through carefully curated inputs. The case study notes this produced a “noticeable jump in performance” but that the AI still struggled with non-obvious scenarios.

Incorporating Historical Decisions

The breakthrough, according to Doppel, came when they incorporated “thousands of well-curated historical decisions” into the model. This approach effectively distilled years of analyst experience into the AI agent. This technique resembles few-shot learning or fine-tuning approaches where high-quality labeled data from previous human decisions is used to teach the model the nuances of expert judgment. The emphasis on “well-curated” suggests significant effort went into data quality assurance for this training dataset.

This approach is notable for production LLM systems because it addresses a common challenge: getting models to handle edge cases that require domain expertise beyond what general pre-training provides.

Continuous Learning and Feedback Loops

The case study emphasizes that the agent is “constantly learning as it sees new examples.” This represents a critical LLMOps consideration: maintaining model performance over time as the threat landscape evolves. Phishing attacks are described as “a high-speed cat-and-mouse game,” making continuous adaptation essential.

However, the text does not provide specific technical details about how this continuous learning is implemented. Key questions that remain unanswered include: How frequently is the model retrained or updated? What mechanisms ensure new learning doesn’t degrade performance on previously mastered scenarios? How are new examples curated and validated before being used for training? The lack of detail here suggests this may be an area where the implementation is still evolving.

Model Selection: OpenAI’s o1 Model

Doppel selected OpenAI’s o1 model for their AI agent, which was showcased at OpenAI DevDay. The o1 model family is notable for its enhanced reasoning capabilities compared to earlier GPT models, using chain-of-thought processing to work through complex problems. This choice aligns with the described use case: the decisions being automated require judgment and reasoning, not just pattern recognition.

The selection of a reasoning-focused model suggests that Doppel’s prompting strategy likely leverages the model’s ability to think through multi-step decisions, though specific prompting techniques are not disclosed.

Production Deployment and Results

Claimed Performance

Doppel claims their AI agent “exceeded human-level benchmarks” with a lower false-positive rate and higher detection of genuine threats compared to human analysts. These are significant claims, though the text provides no specific metrics, sample sizes, or details about how the benchmarks were established and measured.

From an LLMOps perspective, the evaluation methodology is a critical gap in this case study. Questions such as how the comparison was conducted, whether it was on a holdout test set or live production traffic, and how statistical significance was determined remain unanswered. The claim of “human-level” performance is compelling marketing but would benefit from more rigorous documentation.

Workload Reduction

The headline claim is a 30% reduction in SOC workload within 30 days. This suggests the deployment was rapid and the impact was measurable quickly. However, the text does not clarify what metrics were used to measure “workload”—whether it’s analyst hours, number of cases processed, or some other operational metric.

Operational Benefits

For Doppel’s own operations, the deployment allowed human analysts to focus on complex threat patterns while AI handled routine decisions at scale. For customers, the benefits are described as faster response times and more threats eliminated. These benefits are plausible outcomes of successful automation but again lack specific quantification.

LLMOps Considerations and Lessons

Hybrid Architecture

The described system represents a hybrid approach where traditional ML handles initial filtering and LLM-based agents handle nuanced decision-making. This is a pragmatic architecture that leverages each technology’s strengths: traditional ML for high-throughput, low-latency pattern matching, and LLMs for complex reasoning tasks.

Explainability

The case study notes that decisions require explanation of rationale. LLMs, particularly reasoning-focused models like o1, are well-suited for generating explanations alongside decisions. This is an important consideration for security operations where audit trails and decision justification may be required for compliance or post-incident analysis.

Risk Management

The text acknowledges the high stakes of these decisions: missing threats or disrupting legitimate activity. However, it does not describe what safeguards are in place for the AI agent’s decisions. Questions about human oversight, confidence thresholds for automated action, and rollback mechanisms are not addressed.

Data Handling

Processing screenshots, time-series data, and unstructured content requires multi-modal capabilities or sophisticated preprocessing. The text does not detail how these data types are prepared for the LLM or whether multi-modal models are involved.

Critical Assessment

While this case study describes an interesting application of LLMs in security operations, several aspects warrant skepticism:

The metrics are vague and lack independent verification
The 30-day timeline for deployment and impact measurement is remarkably fast
The claim of exceeding human-level performance is made without methodological detail
The case study is essentially marketing content from the company itself

That said, the general approach described—using domain expertise to train agents, incorporating historical decisions, and maintaining continuous learning loops—represents sound LLMOps practices that are applicable across industries.

Future Directions

Doppel indicates they are “just getting started” and their engineering team is “re-imagining what’s possible in the SOC using AI agents from the ground up.” This suggests ongoing investment in LLM-based automation for security operations, with potential expansion of automated capabilities beyond the initial 30% of workload currently addressed.

AI Agent Automation of Security Operations Center Analysis

Industry

Technologies