Infosys, in collaboration with AWS, developed a sophisticated multimodal RAG solution specifically designed to handle the complex technical documentation challenges faced by the oil and gas industry. This case study represents a comprehensive exploration of production-ready LLMOps practices, demonstrating how enterprise organizations can iteratively develop and deploy advanced AI systems to solve real-world business problems.
The core business challenge centered around the oil and gas industry's generation of vast amounts of complex technical data through drilling operations, creating significant bottlenecks in data processing and knowledge extraction. Traditional document processing methods were failing to handle the unique characteristics of enterprise documents in this domain, which include highly technical terminology, complex multimodal data formats combining text, images, charts, and technical diagrams, and interconnected information spread across various document types. This resulted in inefficient data extraction, missed insights, and time-consuming manual processing that hindered organizational productivity and critical decision-making processes.
The production architecture leverages a comprehensive stack of AWS services, with Amazon Bedrock Nova Pro serving as the primary large language model, Amazon Bedrock Knowledge Bases providing managed RAG capabilities, and Amazon OpenSearch Serverless functioning as the vector database. The embedding strategy utilizes both Amazon Titan Text Embeddings and Cohere Embed English models, while BGE Reranker enhances search result relevance. Notably, the team also employed Amazon Q Developer as an AI-powered assistant for both frontend and backend development, demonstrating the integration of AI throughout the development lifecycle.
The iterative development process reveals sophisticated LLMOps practices through multiple experimental phases. The initial RAG exploration involved processing over a thousand technical images using Amazon Nova Pro with iterative prompting strategies. This multimodal approach generated comprehensive descriptions through initial image analysis to extract basic technical elements, refined prompting with domain-specific context to capture specialized terminology, and multiple inference iterations to ensure completeness and accuracy. However, this approach revealed limitations in handling image-related queries due to lack of proper chunking strategies for visual content.
The team's exploration of multi-vector embeddings with ColBERT demonstrated advanced embedding techniques for fine-grained text representations. They implemented tensor-based storage for complex ColBERT embeddings and developed similarity scoring mechanisms between query and document embeddings. While this approach showed potential for enhanced document understanding and achieved fine-grained representation of visual and textual content, it also highlighted practical challenges in storing and managing complex embeddings in production vector stores, with debugging and document analysis becoming cumbersome.
A significant breakthrough came with the implementation of parent-child hierarchy chunking using Cohere Embeddings. This approach balanced the need for context preservation with precise information retrieval through parent chunks of 1,500 tokens maintaining document-level context and child chunks of 512 tokens containing detailed technical information. The careful structuring of content significantly enhanced the performance of both embedding and question-answering models, with the hierarchical structure proving particularly effective for handling the complex, nested nature of oil and gas documentation.
The final production solution represents a highly evolved RAG system implementing hybrid search capabilities that combine semantic vector search with traditional keyword search. The refined chunking strategy uses parent chunks of 1,200 tokens and child chunks of 512 tokens, maintaining the hierarchical approach while optimizing for performance. The integration of BGE reranker provides sophisticated result refinement, ensuring that retrieved documents are properly ordered based on semantic similarity to queries.
The multimodal processing capabilities demonstrate advanced production-ready AI system design. The solution handles diverse information types found in oil and gas documents, processing both textual content including technical jargon, well logs, and production figures, and visual elements such as well schematics, seismic charts, and lithology graphs while maintaining contextual relationships between them. For instance, when processing a well completion report, the system can extract key parameters from text such as total depth and casing sizes, analyze accompanying well schematics, and link textual descriptions of formations to their visual representations in lithology charts.
Domain-specific vocabulary handling represents a critical production consideration for specialized industries. The system incorporates a comprehensive dictionary of industry terms and acronyms specific to oil and gas operations, addressing the challenge that standard natural language processing models often misinterpret technical terminology. The system accurately interprets complex queries like "fish left in hole at 5000 ft MD" by understanding that "fish" refers to lost equipment rather than an actual fish, "MD" means measured depth, and the operational relevance of this information for drilling operations and potential remediation steps.
The multi-vector retrieval implementation showcases sophisticated production architecture design for handling diverse content types. The system creates separate embedding spaces for text, diagrams, and numerical data, implementing both dense vector search for semantic similarity and sparse vector search for exact technical terminology matches. Cross-modal retrieval connects information across different content types, while contextual query expansion automatically includes relevant industry-specific terms. This hybrid approach delivers comprehensive retrieval whether users search for conceptual information or specific technical parameters.
Temporal and spatial awareness capabilities demonstrate advanced context understanding crucial for production systems. The system incorporates understanding of well locations and operational timelines, enabling queries that consider geographical and chronological contexts. For example, searching for "recent gas shows in Permian Basin wells" leverages both temporal filtering and spatial awareness to provide relevant, location-specific results, ensuring retrieved information matches the operational context of user needs.
Reflective response generation implements critical quality assurance mechanisms essential for production AI systems where accuracy is paramount. The system uses reflective prompting mechanisms that prompt the language model to critically evaluate its own responses against source documents and industry standards. Response reranking utilizes scoring models that evaluate technical accuracy, contextual relevance, and adherence to industry best practices. This multi-layered validation approach ensures generated responses meet the high accuracy standards required for technical decision-making in drilling operations.
Advanced RAG strategies implemented in production include hypothetical document embeddings that generate synthetic questions based on document content and create embeddings for these hypothetical questions, improving retrieval for complex, multi-part queries particularly effective for handling what-if scenarios in drilling operations. Recursive retrieval implements multi-hop information gathering, allowing the system to follow chains of related information across multiple documents essential for answering complex queries requiring synthesis from various sources. Semantic routing intelligently routes queries to appropriate knowledge bases or document subsets, optimizing search efficiency by focusing on the most relevant data sources. Query transformation automatically refines and reformulates user queries for optimal retrieval, applying industry-specific knowledge to interpret ambiguous terms and breaking down complex queries into series of simpler, more targeted searches.
The production deployment demonstrates significant scalability considerations with distributed processing handling large volumes of data, ensuring the system can handle high request volumes without performance compromise. Real-time indexing allows new documents to be incorporated as soon as they become available, maintaining up-to-date information access. The system achieves impressive performance metrics with average query response times under 2 seconds and consistent 92% retrieval accuracy measured against human expert baselines.
Business outcomes validate the production value of the LLMOps implementation, delivering 40-50% decrease in manual document processing costs through automated information extraction, 60% reduction in time field engineers and geologists spend searching for technical information, and significant risk mitigation through reliable access to critical technical knowledge. User satisfaction ratings of 4.7/5 based on feedback from field engineers and geologists demonstrate successful user adoption and system effectiveness.
The case study illustrates sophisticated LLMOps practices including iterative development methodologies, comprehensive evaluation frameworks, production deployment strategies, and continuous monitoring and optimization. The evolution from initial approaches through multiple experimental phases to final production implementation showcases best practices in enterprise AI development, emphasizing the importance of domain expertise, systematic experimentation, and robust validation mechanisms in building production-ready AI systems.