Writer: Evolution from Vector Search to Graph-Based RAG for Enterprise Knowledge Systems

LLMOps Database

Tech

Writer

Company

Writer

Title

Evolution from Vector Search to Graph-Based RAG for Enterprise Knowledge Systems

Industry

Tech

Link

https://www.youtube.com/watch?v=XlAIgmi_Vow

Year

2025

Summary (short)

Writer, an enterprise AI platform company, evolved their retrieval-augmented generation (RAG) system from traditional vector search to a sophisticated graph-based approach to address limitations in handling dense, specialized enterprise data. Starting with keyword search and progressing through vector embeddings, they encountered accuracy issues with chunking and struggled with concentrated enterprise data where documents shared similar terminology. Their solution combined knowledge graphs with fusion-in-decoder techniques, using specialized models for graph structure conversion and storing graph data as JSON in Lucene-based search engines. This approach resulted in improved accuracy, reduced hallucinations, and better performance compared to seven different vector search systems in benchmarking tests.

## Overview Writer is an end-to-end agentic AI platform for enterprises that builds proprietary models, develops graph-based RAG systems, and provides software tools for enterprises to build AI agents and applications. The company serves Fortune 500 and Global 2000 companies, particularly in highly regulated industries like healthcare and finance where accuracy and low hallucination rates are critical requirements. This case study presents their multi-year journey from traditional vector search to a sophisticated graph-based RAG system, demonstrating real-world evolution of production LLM systems at enterprise scale. The company's research team focuses on four main areas: enterprise models (including their Palmyra X5 model), practical evaluations (such as their FailSafe QA finance benchmark), domain-specific specialization (Palmyra Med and Palmyra Fin models), and retrieval and knowledge integration. Their customer-driven approach to research distinguishes them from purely theoretical research, focusing on solving practical problems for enterprise clients dealing with terabytes of dense, specialized data. ## Technical Evolution and Production Challenges ### Initial Approaches and Limitations Writer's RAG system evolution represents a classic case of production LLM system maturation. They began with basic keyword search querying knowledge bases, which quickly proved insufficient for advanced similarity search requirements. Like many organizations, they then moved to vector embeddings with chunking, similarity search, and LLM integration. However, this approach revealed two critical production issues that many enterprises face. The first major limitation was accuracy degradation due to chunking and nearest neighbor search. The presenter illustrated this with an example about Apple's founding timeline, where naive chunking caused the system to incorrectly associate the Macintosh creation date with 1983 instead of 1984 due to proximity bias in chunks containing information about the Lisa computer. This demonstrates a fundamental challenge in production RAG systems where context boundaries significantly impact retrieval accuracy. The second critical issue was the failure of vector retrieval with concentrated enterprise data. Unlike diverse document collections covering varied topics, enterprise data often exhibits high similarity in terminology and concepts. For instance, a mobile phone company's document corpus would contain thousands of documents using similar technical terms like "megapixels," "cameras," and "battery life." When users requested comparisons between phone models, vector similarity search would return numerous relevant documents but provide insufficient disambiguation for the LLM to generate accurate comparative analyses. ### Graph-Based RAG Implementation To address these limitations, Writer implemented graph-based RAG, querying graph databases to retrieve relevant documents using relationships and keys rather than pure similarity metrics. This approach preserved textual relationships and provided enhanced context to language models, significantly improving accuracy. However, their early implementation encountered several production challenges that illustrate common enterprise deployment issues. Converting enterprise data into structured graph format proved challenging and costly at scale. As graph databases scaled with enterprise data volumes, the team encountered both technical limitations and cost constraints. Additionally, Cypher queries struggled with the advanced similarity matching requirements, and they observed that LLMs performed better with text-based queries rather than complex graph structures. ### Innovative Solutions and Team Expertise Leverage Rather than abandoning the graph approach, Writer's team demonstrated pragmatic problem-solving by leveraging their core competencies. For graph structure conversion, they built a specialized model capable of running on CPUs or smaller GPUs, trained specifically to map enterprise data into graph structures of nodes and edges. This solution predated the availability of suitable fine-tunable small models, demonstrating proactive model development for production needs. To address graph database scaling and cost issues, they implemented a creative hybrid approach, storing graph data points as JSON in a Lucene-based search engine. This maintained graph structural benefits while leveraging the team's existing search engine expertise and avoiding performance degradation at enterprise scale. This solution exemplifies production system pragmatism, balancing theoretical ideals with operational realities. ### Advanced Retrieval Architecture Writer's most sophisticated innovation involved implementing fusion-in-decoder techniques, drawing from research that revisited the original RAG paper's proposals. The original RAG paper never actually discussed prompt-context-question approaches that became standard practice, instead proposing a two-component architecture with retriever and generator components using pre-trained sequence-to-sequence models. Fusion-in-decoder processes passages independently in the encoder for linear rather than quadratic scaling, then jointly in the decoder for improved evidence aggregation. This approach delivers significant efficiency improvements while maintaining state-of-the-art performance. Writer extended this with knowledge graph fusion-in-decoder, using knowledge graphs to understand relationships between retrieved passages, addressing efficiency bottlenecks while reducing costs. Their implementation involved building their own fusion-in-decoder system since they develop proprietary models, representing significant engineering investment in production RAG infrastructure. This approach combined multiple research-backed techniques to optimize customer outcomes rather than chasing trending technologies. ## Production Performance and Validation ### Benchmarking and Evaluation Writer conducted comprehensive benchmarking using Amazon's Robust QA dataset, comparing their knowledge graph and fusion-in-decoder system against seven different vector search systems. Their system achieved superior accuracy and fastest response times, validating their multi-year development investment. This benchmarking represents proper production system evaluation, using standardized datasets for comparative analysis. The benchmarking methodology demonstrates enterprise-grade evaluation practices, testing against multiple established vector search solutions rather than isolated performance metrics. This comprehensive comparison provides credible evidence for their architectural decisions and helps justify the development investment required for their sophisticated approach. ### Production Features and Capabilities The graph-based architecture enables several production features that enhance enterprise utility. The system can expose its reasoning process by showing snippets, subqueries, and sources for RAG answers, providing transparency for enterprise users who need to understand and validate AI-generated responses. This transparency capability is particularly crucial for regulated industries where audit trails and explainability are mandatory. Multi-hop questioning capabilities allow reasoning across multiple documents and topics without performance degradation, addressing complex enterprise information needs that require synthesizing information from diverse sources. This represents a significant advancement over simpler RAG systems that struggle with complex, interconnected queries. The system handles complex data formats where vector retrieval typically fails, such as answers split across multiple pages or queries involving similar but not identical terminology. The graph structure combined with fusion-in-decoder provides additional context and relationships that enable correct answer formulation in challenging scenarios. ## LLMOps Insights and Production Lessons ### Customer-Driven Development Philosophy Writer's approach emphasizes solving customer problems rather than implementing trendy solutions, representing mature LLMOps thinking. Their research team focuses on practical customer insights rather than theoretical exploration, ensuring development efforts address real enterprise pain points. This customer-centric approach helped them identify and solve fundamental vector search limitations before they became widely acknowledged industry challenges. ### Flexibility and Expertise Utilization The team's willingness to stay flexible based on their expertise rather than rigidly following architectural patterns demonstrates sophisticated production system thinking. When encountering graph database limitations, they leveraged their search engine expertise rather than forcing unsuitable solutions. This pragmatic approach enabled them to maintain graph benefits while working within their operational capabilities. ### Research-Informed Innovation Writer's integration of academic research, particularly fusion-in-decoder techniques, with practical production requirements exemplifies mature LLMOps practices. They didn't simply implement existing solutions but combined multiple research approaches to create optimized systems for their specific enterprise requirements. This research-informed approach enabled breakthrough performance improvements while maintaining production reliability. ### Scalability and Performance Considerations Their journey illustrates critical scalability considerations for enterprise RAG systems. Vector search limitations become more pronounced with concentrated enterprise data, requiring more sophisticated retrieval strategies. Their solutions address both computational efficiency (linear vs. quadratic scaling) and operational efficiency (leveraging existing team expertise), demonstrating comprehensive production system thinking. The case study reveals that successful enterprise RAG systems require multiple complementary techniques rather than single-solution approaches. Their final architecture combines specialized models, graph structures, advanced search techniques, and fusion-in-decoder processing, illustrating the complexity required for production-grade enterprise AI systems. This comprehensive approach to RAG system evolution provides valuable insights for organizations developing production LLM systems, particularly those dealing with dense, specialized enterprise data where accuracy and explainability are paramount concerns.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source