Elastic's Field Engineering team developed a customer support chatbot, focusing on crucial UI/UX design considerations for production deployment. The case study details how they tackled challenges including streaming response handling, timeout management, context awareness, and user engagement through carefully designed animations. The team created a custom chat interface using their EUI component library, implementing innovative solutions for handling long-running LLM requests and managing multiple types of contextual information in a user-friendly way.
This case study from Elastic provides valuable insights into the practical challenges and solutions involved in deploying LLM-powered chatbots in production, with a specific focus on the often-overlooked UI/UX aspects of such systems. The study is particularly interesting as it highlights the intersection between traditional web development practices and the unique requirements that come with LLM-based applications.
Elastic's Field Engineering team developed a Support Assistant chatbot, and this case study specifically details the UI/UX considerations and technical implementations required to make the system production-ready. What makes this case study particularly valuable is its focus on the front-end challenges that arise when deploying LLMs in production - an aspect that often receives less attention in technical discussions about LLMOps.
The team identified and addressed several key technical challenges:
**Response Latency and User Experience**
The system faced significant latency challenges, with total response times ranging from 5.1 to 11.5 seconds. This breaks down into several components:
* Initial request (100-500ms)
* RAG search (1-2.5s)
* LLM call (1-2.5s)
* First streamed byte (3-6s)
To manage these latencies, they implemented several technical solutions:
*Custom Loading Animation Implementation*
The team developed a sophisticated loading animation system that adhered to their brand guidelines while keeping users engaged during long-running requests. Interestingly, they used their own LLM system to help generate the animation code, showcasing a novel use of LLMs in the development process itself. This approach demonstrates how LLMs can be integrated into the development workflow, not just the end product.
*Sophisticated Timeout Handling*
They implemented an innovative approach to handling timeouts in streaming LLM responses. Traditional timeout mechanisms proved inadequate for LLM streaming scenarios, as they often receive a 200 OK response quickly but then face delays or interruptions in the actual data stream. The team implemented a custom "killswitch" using AbortController signals and setTimeout, automatically terminating requests after 10 seconds of inactivity. This was determined to be the optimal threshold through testing - long enough to avoid premature cancellation but short enough to maintain good user experience.
*Context Management System*
One of the more sophisticated aspects of the implementation was their approach to context management. The system handles multiple types of context:
* Conversation history
* Support case details
* Knowledge base search results
The team developed a novel UI solution for managing these different contexts, implementing a "prepended" element to the text input area that allows users to see and modify the current context. This solution emerged after evaluating several alternatives including breadcrumbs, alert bars, and badges. The final implementation allows power users to combine different types of context (e.g., case history and knowledge base search) for more complex queries.
**Technical Implementation Details**
The system was built using Elastic's own UI component library (EUI), demonstrating how existing tools can be adapted for LLM applications. While they didn't build everything from scratch, they had to create custom components and behaviors to handle LLM-specific requirements. The implementation includes:
* Custom streaming response handlers
* Context serialization and storage systems
* Integration with their RAG system
* Error handling specific to LLM streaming responses
**Observability and Monitoring**
While not the main focus of this particular case study, the text mentions integration with observability systems, suggesting proper monitoring of the UI components and their interaction with the backend LLM services.
**Lessons Learned and Best Practices**
Several key insights emerge from this implementation:
* Traditional web development patterns often need modification for LLM applications
* User experience considerations are crucial for successful LLM deployment
* Context management requires careful UI design to remain intuitive
* Timeout and error handling need special consideration in streaming LLM applications
* Brand consistency can be maintained even with novel UI patterns
This case study is particularly valuable as it highlights the practical challenges of implementing LLMs in production systems, specifically from a front-end perspective. It demonstrates that successful LLMOps isn't just about model deployment and performance, but also about creating intuitive and responsive user interfaces that can handle the unique characteristics of LLM interactions.
The implementation shows a sophisticated understanding of both traditional web development best practices and the novel challenges presented by LLM-based applications. The solutions developed, particularly around timeout handling and context management, provide valuable patterns that could be applied to other LLM-based applications.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.