Benchling developed a Slackbot to help engineers navigate their complex Terraform Cloud infrastructure by implementing a RAG-based system using Amazon Bedrock. The solution combines documentation from Confluence, public Terraform docs, and past Slack conversations to provide instant, relevant answers to infrastructure questions, eliminating the need to search through lengthy FAQs or old Slack threads. The system successfully demonstrates a practical application of LLMs in production for internal developer support.
Benchling's case study presents a practical implementation of LLMs in production for internal developer tooling, specifically focused on supporting their infrastructure team's use of Terraform Cloud. The company manages approximately 160,000 Terraform resources across five data centers, with about 50 engineers making infrastructure changes monthly. This creates a significant knowledge management challenge, as engineers need quick access to documentation and troubleshooting information.
The technical solution they developed combines several modern LLMOps practices and architectural patterns. At its core, the system uses a Retrieval-Augmented Generation (RAG) approach implemented through Amazon Bedrock, which helps ground the LLM's responses in authoritative sources and reduce hallucination risks. The knowledge base incorporates multiple data sources:
* Internal Confluence documentation (converted from PDF)
* Public Terraform Cloud documentation
* Terraform language documentation
* Historical Slack thread solutions
The architecture demonstrates a well-thought-out production deployment using AWS services:
* Amazon Bedrock for the core LLM functionality
* OpenSearch Serverless as the vector database
* AWS Lambda for serverless compute
* API Gateway for handling Slack interactions
* Two distinct models: Amazon Titan Text Embeddings v2 for embedding generation and Claude 3.5 Sonnet v2 for inference
One of the most interesting aspects of their LLMOps implementation is the careful consideration given to chunking strategies. They discovered that the default 300-token chunks were insufficient for their use case, as many FAQ answers included multi-step solutions spanning several paragraphs. Through experimentation, they found that hierarchical chunking with 1500-token parent chunks provided optimal results. This highlights the importance of tuning RAG systems based on the specific characteristics of the knowledge base.
The team's approach to production deployment shows strong DevOps practices, with infrastructure defined as code using Terraform (though they note some limitations with current Terraform provider support for Bedrock). They've implemented proper security controls through IAM roles and policies, and use AWS Secrets Manager for sensitive credentials.
The system's limitations are well documented, including:
* Inability to process images in documentation or queries
* Current lack of citation links in responses
* Manual data synchronization requirements
* Limited conversation context (stateless implementation)
Their implementation demonstrates several LLMOps best practices:
* Clear separation of concerns between different system components
* Use of managed services to reduce operational overhead
* Proper error handling and logging
* Security considerations for sensitive data
* Monitoring and observability through CloudWatch
Future enhancements they're considering show a mature approach to iterative improvement:
* Automated data source updates
* Direct Confluence API integration
* Multi-turn conversation support
* Improved citation handling
The case study provides valuable insights into the practical considerations of deploying LLMs in production. Their chunking strategy findings are particularly valuable for others implementing RAG systems. The architecture choices demonstrate a balance between functionality, maintainability, and operational overhead.
From an LLMOps perspective, the system shows how modern cloud infrastructure can be leveraged to create practical AI-powered tools. The use of Amazon Bedrock significantly simplified the deployment process, though it's worth noting this creates vendor lock-in. The team's approach to knowledge base management and content chunking shows the importance of considering content structure when implementing RAG systems.
The implementation is particularly noteworthy for its focus on a specific, well-defined use case rather than attempting to create a general-purpose assistant. This focused approach likely contributed to its success and provides a template for other organizations looking to implement similar systems.
Cost considerations aren't explicitly discussed in the case study, but the architecture suggests careful attention to resource utilization through serverless components and appropriate model selection. The use of Claude 3.5 Sonnet rather than a larger model indicates practical consideration of the performance/cost trade-off.
The system's success demonstrates the viability of LLM-powered internal tools when properly implemented with attention to data quality, architecture, and user experience. It serves as a reference implementation for future AI projects at Benchling, showing how companies can practically leverage LLMs in production environments while maintaining control over their knowledge base and ensuring responses are grounded in authoritative sources.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.