## Overview
This case study presents a comprehensive look at how Deloitte built their Cybersecurity Intelligence Center using AWS's Graph RAG Toolkit, an open-source Python library developed by the Amazon Neptune team. The presentation features two speakers: Ian Robinson, a graph architect from the Amazon Neptune Service team, and Evan Irwy, AVP for cyber operations at Deloitte. The collaboration showcases a sophisticated LLMOps implementation that addresses the critical challenge of security alert overload in cloud environments.
The fundamental problem Deloitte addresses is what happens when organizations first enable cloud security platforms like Wiz or CrowdStrike—they are immediately flooded with hundreds or thousands of security alerts and non-compliance notifications. SecOps engineers must then triage and prioritize these issues based on two key factors: understanding each issue's significance within the organization's cybersecurity policies, and assessing the potential impact of remediation on production systems. This requires extensive organization-specific knowledge that cannot be addressed through one-size-fits-all solutions.
## The Graph RAG Toolkit Foundation
The solution builds on AWS's Graph RAG Toolkit, which represents a sophisticated approach to improving retrieval-augmented generation systems. The toolkit was designed with two high-level goals: making it easy to build graphs from unstructured or semi-structured data with minimal information architecture overhead, and helping users find relevant but non-obvious or distant content without writing complex graph queries.
The core innovation is the hierarchical lexical graph model, which serves as a repository of statements. These statements are short, well-formed, standalone propositions extracted from source data, forming the primary unit of context passed to language models. The graph structure comprises three tiers. The lineage tier includes source nodes containing metadata for filtering and versioning, plus chunk nodes representing chunked content with associated embeddings that serve as vector-based entry points. The summarization tier contains the statements themselves, grouped thematically by topics and supported by discrete facts. The entity relationship tier includes entities and relations extracted from source data, providing domain semantics and helping find structurally relevant but potentially dissimilar information.
This responsibility-based approach to graph modeling is particularly noteworthy. Rather than simply representing domain entities, each node type has a specific job at retrieval time to help find relevant statements. Topics provide local connectivity for deep investigation within single documents, while facts enable global connectivity for broad investigations across the corpus.
## Hybrid RAG and Entity Network Contexts
The toolkit implements a hybrid RAG approach that addresses a fundamental challenge: improving recall by finding both content similar to the question and content similar to something different from the question. This is achieved through entity network contexts, which are one or two-hop networks surrounding key entities and keywords extracted from user questions.
The process for generating entity network contexts is methodical. First, significant entities and keywords are looked up in the entity relationship tier, creating candidate entity nodes. These nodes are re-ranked against the question to identify the most important entity, noting its degree centrality. Path expansion then occurs, typically one or two hops, with filtering based on a threshold derived from the most important entity's degree centrality. This filtering eliminates "whales and minnows"—nodes that might dominate the conversation or are potentially irrelevant. Paths are then re-scored, ordered by mean scores, and the top paths are selected for creating textual transcriptions.
These entity network contexts are used in three ways throughout querying. They seed dissimilarity searches via vector similarity search for each path, with results used to find chunk nodes and traverse the graph. They re-rank results using weighted term frequency analysis rather than re-ranking models, with match items comprising the original question plus entity network contexts in descending weighted order. Finally, they enrich the prompt to guide the LLM to pay attention to statements it might otherwise overlook.
The practical example provided demonstrates this effectively: when asking about Example Corp's UK sales prospects, traditional vector search might return only optimistic information about widget demand and improved logistics. However, the Graph RAG approach also retrieves structurally relevant but dissimilar content about a cybersecurity incident affecting the Turquoise Canal used by their logistics partner, leading to a more nuanced assessment acknowledging potential supply chain disruptions.
## Deloitte's Production Implementation: AI for Triage
Deloitte's implementation, called "AI for Triage," extends the Graph RAG Toolkit with additional production-oriented capabilities. The system architecture distinguishes between long-term memory (the lexical graph containing curated organizational experiences) and short-term memory (document graphs containing current operational data from security tools). This conceptual framework mirrors human cognition, where long-term and short-term memory work together without segregation.
A critical design principle is the immutable separation between AI-generated and human-generated content. Triage records generated by AI remain distinct from human annotations, ensuring defensibility of decisions. The system positions humans in the middle—not as mere observers but as augmented decision-makers wearing the metaphorical "Iron Man suit" that enhances their capabilities.
The document graph component, which Deloitte built as an extension to the Graph RAG Toolkit, handles short-term operational memory. Document graphs are organized into logical domains, allowing domain-specific derived vocabularies and moderate entity resolution. This approach recognizes that tools like Wiz or Prisma Cloud have their own vocabulary and inherent relationships that can be leveraged without excessive processing.
## The Processing Pipeline and Cognitive Substrate
A sophisticated pipeline converts signals from various security tools into the graph structure. The pipeline is embedded with Amazon Bedrock and designed to turn any signal, log, or CAAS data into JSONL format, then rapidly into the short-term memory graph. An important feature is the pipeline's intelligence in enriching data—for example, converting IP addresses into ASNs and actual geographic locations, because that contextual information is what analysts actually care about.
The pipeline architecture deliberately separates read and write engines to prevent graph pollution—the injection of phantom data designed to derail analysis. This security consideration is crucial in cybersecurity applications where adversaries might attempt to manipulate the knowledge base itself.
Deloitte created what they call a "cognitive substrate" or "AI-enabled factory" that shields applications from direct exposure to the rapidly evolving AI landscape. The factory interface encapsulates the triage protocol (GenAI prompt sets), document graph (short-term experiences), and lexical graph (long-term memory), providing stability while allowing internal components to evolve. This abstraction layer is stored on DynamoDB and S3, with S3 providing journaling capabilities to replay factory operations if needed.
## Infrastructure and Technology Stack
The production deployment runs on Amazon EKS (Elastic Kubernetes Service), achieving 99.999949% uptime. The graph database is Amazon Neptune with OpenSearch for vector search, though the toolkit supports other backends including Neo4j, Neptune Analytics, and Postgres with pgvector extension. For LLM operations, the system standardizes on Amazon Nova, with Sagemaker brought in for potential future model development using the curated organizational data.
AWS Lambda and API Gateway with CloudFront handle document uploads and downloads. The multi-modal embedding capability was extended to support various data types including video and audio, not just text-based content. This infrastructure choice reflects a deliberate strategy of narrowing platform options for stability while maintaining flexibility through the Graph RAG Toolkit's backend-agnostic design.
## Real-World Results and Operational Impact
The quantitative results demonstrate significant operational impact. Across seven AWS domains, the system processed 50,000 security issues within approximately four weeks. The pipeline distilled these to just over 1,300 usable issues requiring further investigation—a 97% reduction in noise. These were automatically converted into over 6,500 nodes and 19,000 relationships in the graph structure.
The "Wiz playbook," an instance of a triage record within the factory, generates evidence, remediation steps, and other information in JSON format. This structured output feeds back into the system and can be consumed by downstream automation. Analysts can review, annotate, or reject AI-generated triage records, maintaining human oversight while benefiting from AI assistance.
## Automation Strategy and Recipe-Based Approach
Deloitte's approach to automation is particularly thoughtful. Rather than generating executable code directly, the system generates "recipes" for automation—higher-level descriptions that are more durable than code, which can become brittle as libraries change and vulnerabilities emerge. A human-written interpreter executes these AI-generated recipes, maintaining trust boundaries appropriately.
The system implements a "check-do-check" pattern, recognizing that short-term memory is inherently historical. Even when automation is triggered, the system verifies current state before taking action, since conditions may have changed between detection and remediation. These recipes are stored in a central repository and fed back into the lexical graph, enriching the long-term organizational memory over time.
## Knowledge Management and Organizational Learning
The system enables tracking across both individual incidents and classes of incidents. Reports serve as classes of experience, with individual issues as instantiations, allowing traversal through organizational memory either on particular incidents or across all incidents of a type. This structure supports understanding both specific events and broader patterns.
Over time, this creates a feedback loop from operational reality to policy intent. Rather than writing policies divorced from actual operations, organizations can now base policy on real operational experiences captured in the knowledge graph. This represents a significant shift from checkbox compliance and optics-driven security to evidence-based security posture management.
## Critical Production Considerations
Several LLMOps considerations emerge from this implementation. The system addresses the challenge of context window optimization by using statements as the primary unit of context rather than raw chunks, with thematic grouping and source attribution. The hybrid approach of vector and graph search proves mutually beneficial—vector search smooths quality issues in questions and content, while graph search finds structurally relevant information.
Multi-tenancy support allows discrete lexical graphs in the same infrastructure, critical for consulting organizations serving multiple clients. Document versioning enables querying current state or historical states, important for incident investigation and compliance. The system's design for surfacing domain-specific agentic tools to MCP servers positions it for the emerging agentic AI paradigm.
Human-in-the-loop design is fundamental rather than supplementary. The system augments rather than replaces human expertise, with clear boundaries between AI-generated and human-verified content. This approach acknowledges that in cybersecurity contexts, accountability and defensibility of decisions are paramount.
## Limitations and Balanced Assessment
While the presentation emphasizes successes, several considerations warrant balanced assessment. The claim of 99.999949% uptime is remarkably high and might represent a specific measurement period rather than long-term sustained performance. The system's complexity—spanning multiple AWS services, custom pipelines, document graphs, and the Graph RAG Toolkit—suggests significant operational overhead and expertise requirements for deployment and maintenance.
The approach assumes organizations have sufficient historical triage data and documentation to populate meaningful long-term memory. Organizations without this foundation would need to build it over time, potentially limiting initial value. The separation of read and write engines to prevent graph pollution, while security-conscious, adds complexity and potential performance overhead.
The decision to generate recipes rather than code for automation, while conceptually appealing, introduces an additional layer of abstraction and requires maintaining the recipe interpreter. The effectiveness of this approach at scale across diverse automation scenarios remains to be validated through broader deployment.
## Future Directions and Community Contribution
The Graph RAG Toolkit is open source on GitHub, with ongoing development incorporating customer feedback and community contributions. Deloitte has already contributed features, with plans to upstream more capabilities like the document graph module. Planned enhancements include additional vector store backends, expanded multi-modal support, and improved agentic tool integration.
The BYOKG (Bring Your Own Knowledge Graph) module allows integration of existing graphs in Neptune or Neptune Analytics, extending the toolkit's applicability beyond greenfield deployments. This flexibility acknowledges that many organizations have existing graph investments they want to leverage.
## Conclusion
This case study represents a sophisticated production deployment of LLM technology in a high-stakes operational environment. The integration of graph-based knowledge management with retrieval-augmented generation addresses real limitations of pure vector-based approaches, particularly for finding non-obvious connections and maintaining organizational context. The separation of concerns between short-term operational data and long-term organizational memory provides a principled architecture for managing different types of knowledge at different lifecycle stages.
The emphasis on human augmentation rather than replacement, combined with strong accountability boundaries between AI and human contributions, demonstrates mature thinking about LLM deployment in production contexts where decisions have significant consequences. While the implementation complexity is substantial, the architectural patterns and design principles offer valuable insights for organizations building similar production LLM systems, particularly in domains requiring deep contextual understanding, compliance tracking, and defensible decision-making.