Quic: Lessons Learned from Deploying 30+ GenAI Agents in Production

LLMOps Database

Tech

Quic

Company

Quic

Title

Lessons Learned from Deploying 30+ GenAI Agents in Production

Industry

Tech

Link

https://www.youtube.com/watch?v=8MjDOtdEYlc

Year

2023

Summary (short)

Quic shares their experience deploying over 30 AI agents across various industries, focusing on customer experience and e-commerce applications. They developed a comprehensive approach to LLMOps that includes careful planning, persona development, RAG implementation, API integration, and robust testing and monitoring systems. The solution achieved 60% resolution of tier-one support issues with higher quality than human agents, while maintaining human involvement for complex cases.

This case study presents Quic's comprehensive experience in deploying generative AI agents in production environments, particularly focusing on customer experience applications. The presentation, delivered by Bill O'Neal (co-founder, SVP engineering and product), reflects on lessons learned from deploying over 30 different AI agents across various industries. ### Overview and Context Quic positions their work in the broader evolution of digital support, from 1960s phone systems to modern AI agents. They envision AI agents becoming the face of companies within 2-5 years, with predictions that 10% of all e-commerce transactions will be AI-assisted in the near future. The key paradigm shift is that AI agents must understand customers, rather than customers learning to navigate company systems. ### Technical Implementation and Architecture * **RAG Implementation** The system employs Retrieval Augmented Generation (RAG) as a foundational component, though Quic emphasizes that RAG alone is just "fancy search." Their implementation combines information retrieval with context-aware text generation, going beyond simple Q&A to enable actual task completion. They integrate RAG with customer scope data and APIs to create comprehensive experiences. * **API Integration and Security** A crucial aspect of their implementation is the integration with existing business APIs. The system requires careful inventory of available APIs from mobile apps and websites, mapping these against business requirements to identify gaps. Security considerations include authentication models for protected APIs, ensuring secure access while maintaining functionality. * **Data Privacy and Model Management** The implementation includes specific contracts with vendors to prevent customer data from being used in future model training. They manage token quotas carefully, especially for peak seasons like Black Friday. Interestingly, they found that fine-tuning models was rarely necessary, as foundation models proved sufficiently powerful. ### Production Operations and Monitoring * **Testing and Quality Assurance** Quic developed a sophisticated regression testing approach specific to generative AI. Unlike traditional testing, they use LLMs to evaluate other LLMs' responses, determining if outputs are "close enough" rather than exact matches. This becomes crucial during model deprecations and updates. * **Observability and Monitoring** They implemented comprehensive observability tools to monitor internal states and decision-making processes. This includes parallel prompt processing monitoring, cost tracking, token usage analysis, and inference time measurements. The system includes objective alignment tracking and goal state monitoring. * **Hallucination Detection** A separate non-generative model evaluates responses on a 0-1 confidence scale, considering available evidence and conversation state. Low-confidence responses trigger clarification requests or human escalation. ### Deployment Strategy and Change Management * **Phased Rollout** Quic recommends starting with agent-facing deployments before public release, allowing internal staff to interact with and validate the system. They advocate for limited initial public exposure (e.g., 2 hours daily) to manage expectations and identify issues. * **Skills and Team Structure** They identified the need for AI Engineers - a new role combining traditional conversational design skills with technical capabilities in RAG, data transformation, prompt engineering, and Python development. This represents an evolution from traditional conversational designer roles. ### Results and Impact The implementation achieved significant results in customer service efficiency: - 60% resolution rate for tier-one support issues - Higher quality responses compared to human agents - Improved agent handling times - Enhanced ability to handle multiple modalities (text, voice, images) ### Ongoing Operations and Maintenance The system requires continuous monitoring and improvement: - Regular knowledge base updates (typically 24-hour refresh cycles) - Trend analysis using LLMs to evaluate agent performance - Continuous feedback collection and implementation - Regular updates to handle emerging user intents and requirements ### Challenges and Lessons Learned * **Expectations Management**: Organizations often have simultaneously high expectations and low trust in AI systems * **QA Complexity**: Testing cycles for generative AI are consistently longer than traditional chatbots * **Knowledge Management**: Requires regular updates and feedback loops between business stakeholders and knowledge base managers * **Scaling Considerations**: Token quotas and processing capacity need careful planning for peak periods ### Future Directions Quic positions current GenAI development as analogous to early HTML development in web history, suggesting significant future evolution in tools and capabilities. They emphasize the importance of maintaining human involvement while maximizing AI capabilities, creating a balanced hybrid approach to customer service.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source