Hexagon: Building a Secure Enterprise AI Assistant with RAG and Custom Infrastructure

LLMOps Database

Tech

Hexagon

Company

Hexagon

Title

Building a Secure Enterprise AI Assistant with RAG and Custom Infrastructure

Industry

Tech

Link

https://aws.amazon.com/blogs/machine-learning/how-hexagon-built-an-ai-assistant-using-aws-generative-ai-services?tag=soumet-20

Year

2025

Summary (short)

Hexagon's Asset Lifecycle Intelligence division developed HxGN Alix, an AI-powered digital worker to enhance user interaction with their Enterprise Asset Management products. They implemented a secure solution using AWS services, custom infrastructure, and RAG techniques. The solution successfully balanced security requirements with AI capabilities, deploying models on Amazon EKS with private subnets, implementing robust guardrails, and solving various RAG-related challenges to provide accurate, context-aware responses while maintaining strict data privacy standards.

This case study presents a comprehensive look at how Hexagon implemented a production-grade AI assistant for their Enterprise Asset Management (EAM) products, showcasing a methodical approach to deploying LLMs in a highly secure enterprise environment. Hexagon's journey represents a careful balance between leveraging generative AI capabilities while maintaining strict enterprise security and reliability requirements. What makes this case study particularly interesting is their thoughtful approach to infrastructure decisions and their systematic handling of common LLM deployment challenges. The company followed a three-phase approach they termed "crawl, walk, run": * Initial Phase (Crawl): Focus on establishing foundational infrastructure with emphasis on data privacy and security, including implementing guardrails and cost governance * Intermediate Phase (Walk): Integration of customer-specific data while maintaining tenant-level security * Advanced Phase (Run): Development of high-value use cases with standardized best practices A key architectural decision was using their existing Amazon EKS infrastructure for model deployment, which allowed them to leverage existing security controls and DevOps tooling. They chose to run Mistral NeMo, a 12B parameter open source LLM, on Amazon EC2 G6e instances with NVIDIA L40S GPUs. This choice was driven by cost-efficiency considerations for models under 12B parameters. The implementation included several sophisticated LLMOps practices: Security Implementation: * Deployment in isolated private subnets to prevent internet connectivity * Strict no-storage policy for user interactions to maintain privacy * Integration with existing security monitoring tools * Multi-region failover capability using Amazon Bedrock as a backup RAG Implementation Challenges and Solutions: * Developed solutions for context preservation during document chunking using hierarchical chunking in Amazon Bedrock Knowledge Bases * Handled diverse document formats through normalization and FM parsing * Implemented specialized prompts and contextual grounding checks to reduce hallucinations * Created custom documents with FAQs and instructions to improve response accuracy * Developed two approaches for handling conversation follow-ups: prompt-based search reformulation and context-based retrieval with chat history The team had to adapt traditional application development practices for LLM deployment: * Modified testing approaches to account for non-deterministic LLM outputs * Implemented new QA methodologies for response consistency * Developed monitoring systems for varying response latencies (1-60 seconds) * Created specialized prompt management practices using Amazon Bedrock Prompt Management Infrastructure and Scalability: * Used Amazon EKS for compute and model deployment * Implemented Amazon Bedrock Guardrails for safety and compliance * Utilized Amazon S3 for secure document storage * Integrated Amazon Bedrock Knowledge Bases for RAG functionality * Set up multi-region failover using Amazon Bedrock endpoints The team faced and solved several technical challenges unique to enterprise LLM deployment: * Document processing: Developed specialized approaches for handling tables and complex document structures * Context management: Implemented systems to maintain conversation context across multiple interactions * Response accuracy: Created verification systems to ensure factual correctness and reduce hallucinations * Performance optimization: Balanced response time with accuracy through careful prompt engineering The case study demonstrates several best practices for enterprise LLM deployment: * Start with a clear security-first approach * Use existing infrastructure where possible to leverage established security controls * Implement multiple layers of guardrails and safety checks * Create robust monitoring and evaluation systems * Develop specialized testing approaches for LLM applications * Maintain fallback systems for reliability What makes this implementation particularly noteworthy is how it addresses common enterprise concerns about AI deployment: * Security and privacy through isolated infrastructure * Cost management through careful hardware selection * Reliability through redundant systems * Accuracy through sophisticated RAG implementation * Maintainability through integration with existing DevOps practices The results demonstrate that enterprise-grade LLM deployments can be achieved while maintaining strict security and reliability requirements. The system successfully handles complex queries while maintaining data privacy and providing accurate, contextually relevant responses. The implementation shows how careful architecture and systematic approach to challenges can result in a production-ready AI system that meets enterprise standards.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source