## Overview
LinkedIn, a professional networking platform serving over one billion members, developed an AI-enhanced Security Posture Platform (SPP) to streamline vulnerability management and security operations at scale. The SPP AI component represents a production deployment of Large Language Models designed to democratize access to complex security data, enabling security analysts, system owners, and organizational leaders to query security information using natural language rather than requiring technical expertise in graph queries or API navigation.
The core challenge LinkedIn faced was that traditional vulnerability management approaches were not keeping pace with the expanding digital infrastructure and increasing variety of threats. Security teams needed to quickly answer questions like "Are we affected by vulnerability X?" or "What is the highest risk vulnerability on my devices?" but the existing tools—even with GraphQL playgrounds and flexible APIs—were not accessible enough for all users who needed timely security insights.
## Architecture and Technical Implementation
The SPP AI architecture consists of several interconnected components that work together to transform natural language queries into accurate data retrieval and summarization. The system is built on top of a Security Knowledge Graph that serves as a comprehensive repository of all digital assets and their relationships, containing several hundred gigabytes of data aggregated from over two dozen security data sources.
### Context Generation Pipeline
The context generation process is foundational to enabling LLMs to respond accurately to security queries. This involves several key steps:
The system starts with seed data—a carefully curated set of predefined queries and responses that serve as initial learning material for the LLMs. Automated scripts regularly update this seed data to incorporate the latest changes in the dataset. This represents a continuous maintenance challenge typical of production LLM systems.
Synthetic data generation is employed to extend the AI's capabilities beyond the current scope of the database. By leveraging seed data, LLMs generate synthetic datasets that simulate various potential scenarios, enhancing the system's adaptability. This approach is notable because it addresses the constraint LinkedIn faced with limited model capacity during early experimentation—they had to innovate within significant resource constraints.
Real-time metadata from user interactions and system updates is synthesized alongside synthetic data, adding contextual depth that helps LLMs understand queries more completely. The enriched synthetic datasets are then embedded into a vector index, creating a centralized repository that enables efficient data retrieval during query processing.
### Multi-Query Generation
One of the most technically interesting aspects of SPP AI is how it handles query generation. A primary challenge is mapping natural language queries to the appropriate nodes, properties, and relationships within the knowledge graph. LinkedIn developed a function-based query mapping approach to address the flexibility challenges posed by GraphQL.
Because GraphQL queries can vary significantly between use cases (unlike RESTful APIs with fixed endpoints), LinkedIn maps functions to node types in the knowledge graph rather than to predefined API endpoints. This allows the function calling feature to help the model choose the most relevant node types from the available nodes, simplifying identification and selection.
After identifying relevant nodes and properties, SPP AI constructs a comprehensive prompt that includes the user's question along with selected properties and related query examples. This prompt is then processed by the LLM to generate Cypher queries for data retrieval. The system also includes a lightweight GraphQL query generation implementation alongside the Cypher query capability.
The architecture includes fallback mechanisms that activate when primary queries fail to yield sufficient results, with secondary queries prepared to provide continuity through iterative refinement. Dynamic prompt generation and error handling using semantic search techniques help refine context when inaccuracies arise.
### Query Routing and Output Summarization
The query routing component directs queries to the most efficient knowledge graph or GraphQL backend with low latency and high accuracy. The architecture is designed to support multiple data backends, enabling questions to be asked from various data stores without requiring all data to be aggregated first—an important consideration for production systems dealing with distributed data sources.
For output summarization, the system leverages LLMs' natural strength in summarizing data. The user's query, results, and context are used to generate comprehensive answers. Chat data is stored in a temporary store as memory and added to subsequent questions in the same context. The team acknowledges that managing subsequent queries with changing intents remains an ongoing challenge.
## Evaluation and Testing Framework
LinkedIn implemented a robust accuracy testing framework that is crucial for maintaining production quality. The framework includes several components:
Seed data and validation datasets are used to assess query accuracy by comparing AI-generated queries against ground truths or expert-curated queries. The validation dataset specifically tests the system against tangential scenarios, assessing its ability to handle edge cases and less common queries. This approach of separating training scenarios from validation scenarios is essential for honest evaluation.
The testing framework is designed for continuous iteration and refinement, with system adjustments made based on testing outcomes to improve prompt generation, data synthesis, and overall query handling.
Importantly, human-powered validation supplements automated testing. Human experts review a subset of queries and responses to gauge the system's effectiveness in real-world scenarios, validating that outputs are accurate, practical, and understandable to end-users. This human-in-the-loop approach is critical for security applications where errors can have significant consequences.
The team notes that their "blind test cases" are designed so that each one pertains to a different part of the graph, accessing specific sets of nodes, relationships, and properties with challenging questions not directly in the training set. They acknowledge that crafting correct queries for these tests can be non-trivial even for engineers who understand the system well.
## Model Evolution and Performance
A notable aspect of this case study is the longitudinal view of model evolution. LinkedIn started SPP AI development three generations of GPT models ago with Davinci, achieving only 40-50% accuracy in blind tests. With the current generation of GPT-4 models, accuracy improved to 85-90%. This represents a significant improvement but also highlights an ongoing challenge: each generation of models requires tuning of prompts and system components to perform effectively.
The team emphasizes that technology evolves swiftly and recommends approaching such development in a more model-agnostic way to better leverage new capabilities as they emerge. Each model upgrade came with adaptation and rework requirements.
Addressing hallucinations was identified as a crucial tuning concern. By refining context and prompts, the team reduced hallucinations in predicting node/relationship labels and properties, though they acknowledge that eliminating them entirely remains an ongoing process.
## Production Results and Impact
The reported results are significant: SPP enhanced vulnerability response speed by approximately 150% and increased coverage of digital infrastructure by approximately 155%. The unified platform with dynamic risk assessments and automated decision-making capabilities minimizes manual intervention while providing comprehensive visibility across the security landscape.
The system enables new team members to use the platform effectively from their first day, removing language barriers and reducing the learning curve associated with complex security data systems. This democratization of security data access is a key value proposition.
## Lessons Learned and Future Considerations
LinkedIn shares several candid lessons from the development process. Having a pre-existing normalized graph with data from over two dozen security sources was extremely helpful—adding an AI layer provided instant insights without needing to integrate with each individual source separately. They used private Azure OpenAI models tested safely using internal tools, enabling rapid iteration.
However, the AI interface was not initially part of the development plan for SPP, so adapting the graph and dealing with overlapping naming conventions became complex. This suggests that organizations planning to add AI layers should consider this from the beginning of their data architecture design.
Future considerations include exploring agents and smaller language models for specific tasks, with fine-tuning expected to provide more precise and efficient results. The team is also interested in proactive mitigation—not just identifying vulnerabilities but actively guiding analysts through remediation. The knowledge graph approach enables a holistic security context that can integrate with decision systems for preventive threat management.
## Critical Assessment
While the results reported are impressive, it's worth noting that the 150% and 155% improvement figures are somewhat vague in terms of baseline and measurement methodology. The accuracy metrics (40-50% to 85-90%) provide more concrete evidence of improvement over time, though the definition of accuracy in their blind tests could benefit from more detailed explanation.
The case study provides valuable transparency about ongoing challenges, including managing queries with changing intents, eliminating hallucinations entirely, and the continuous effort required to tune each new model generation. This honesty about limitations adds credibility to the reported successes.