Telus: Enterprise-Scale LLM Platform with Multi-Model Support and Copilot Customization

LLMOps Database

Telecommunications

Telus

Company

Telus

Title

Enterprise-Scale LLM Platform with Multi-Model Support and Copilot Customization

Industry

Telecommunications

Link

https://www.youtube.com/watch?v=bUmI6VDKdcM

Year

2024

Summary (short)

Telus developed Fuel X, an enterprise-scale LLM platform that provides centralized management of multiple AI models and services. The platform enables creation of customized copilots for different use cases, with over 30,000 custom copilots built and 35,000 active users. Key features include flexible model switching, enterprise security, RAG capabilities, and integration with workplace tools like Slack and Google Chat. Results show significant impact, including 46% self-resolution rate for internal support queries and 21% reduction in agent interactions.

## Overview This case study presents FuelEx, an enterprise generative AI platform developed by Telus, a major Canadian telecommunications company. The presentation was delivered at the Toronto Machine Learning Summit by Liz (Developer Advocate and Engineering Manager) and Sarah (Software Developer) from Telus's platform engineering team. FuelEx represents Telus's approach to democratizing generative AI across their organization while maintaining enterprise-grade security, flexibility, and responsible AI practices. The platform emerged from an engineering productivity team initiative approximately 18 months prior to the presentation. What began as a Slack integration for generative AI grew into a comprehensive platform used by thousands of employees daily. The team eventually merged expertise from Telus Digital Experience (formerly Telus International) and Willow Tree to form a unified vision and roadmap. ## Architecture and Platform Design FuelEx operates as a centralized management layer sitting atop foundational AI services and cloud infrastructure. The architecture is deliberately positioned in what the presenters describe as the "AI value chain" - above hardware providers (like NVIDIA), cloud hosting layers (AWS, Azure, GCP), and foundational models, but below the application layer that end users interact with. The platform consists of two main components: FuelEx Core and FuelEx Apps. The Core component handles all centralized management functions including integrations, observation/monitoring, orchestration across different models, moderation, and validation. The application layer includes web applications, Slack integration, Google Chat integration, with Microsoft Teams support planned for the future. A notable architectural decision is the multi-cloud, multi-model approach. The platform is not locked into a single cloud provider - they host OpenAI models on Azure, Claude models on AWS Bedrock, and utilize GCP as well. This is managed through a proxy layer that enables load balancing, fallback mechanisms, and retries across the various model providers. For Canadian data residency requirements (a critical concern for a Canadian telecom), the platform supports models and infrastructure hosted entirely within Canada. Their vector database solution uses Turbo Puffer, a Canadian company with Canadian hosting. They also leverage Cohere Command R models and other Canada-hosted model options. ## Co-Pilot Architecture and Implementation The core user-facing concept in FuelEx is the "co-pilot" - customized AI bots created for specific use cases. These are analogous to OpenAI's GPTs but built entirely as Telus's own solution rather than layered on top of OpenAI's Assistants API. Each co-pilot can have its own system prompt, connected knowledge bases, model selection, access controls, and guardrail configurations. The platform uses Retrieval Augmented Generation (RAG) architecture for knowledge retrieval. When documents are uploaded, they are processed (including OCR for images), embedded, and stored in a vector database. At query time, similarity search retrieves relevant context which is then provided to the LLM along with the user's question and system prompt. For tool use and function calling, FuelEx implements a planner-executor architecture. When a user submits a query, the system sends all available tools to the LLM along with the original question and asks it to devise a plan. The LLM determines which tools to call (e.g., internet search, image generation with DALL-E), the execution layer runs those tools, and then the responses are sent back to the LLM for final summarization and response generation. ## Production Deployments and Results The presentation highlighted several production co-pilots deployed within Telus: **Spock (Single Point of Contact)**: An internal IT support bot used by all team members. After several months of operation, 46% of interactions are resolved by the co-pilot without escalation to human agents. In one month (June), they observed a 21% decrease in agent interactions month-over-month. **One Source**: A customer service agent co-pilot that helps call center representatives pull information more quickly to respond to customer inquiries. **Milo**: A co-pilot for retail store representatives to answer questions and process customer requests more efficiently. **Telus J Co-pilot**: A generic co-pilot available to all Telus employees with access to tools like internet search and image generation for day-to-day queries. The overall platform metrics are impressive: over 30,000 custom co-pilots have been built on the platform, with more than 35,000 active users. Use cases span meeting summaries, image generation, business requirements documentation, and coding assistance. ## Integration Strategy A key design philosophy is meeting users "where they work" rather than forcing them to adopt another web application. The platform integrates with Slack, Google Chat, and soon Microsoft Teams. The same co-pilot with identical configurations (system prompt, model, settings) can be accessed across different channels because it's built at the platform level rather than being channel-specific. Future integration plans include VS Code extensions, GitHub Actions, and Chrome extensions. The Chrome extension concept would enable scraping and contextualizing web content directly. ## Model Comparison and Experimentation FuelEx includes a "Lab" feature for developers and builders to experiment with different models before deployment. The Lab enables: - Testing across multiple models (GPT-4, Claude Sonnet, Gemini 1.5 Flash, etc.) with the same prompts - Adjusting system prompts and settings - Viewing cost and latency metrics - Building and exporting experiments as code - Side-by-side comparison panels to evaluate how different models respond to identical inputs The presenters demonstrated this by showing how the same translation system prompt produced different behaviors across models - with some following instructions precisely while others deviated. ## Security and Responsible AI Telus emphasizes responsible AI as a core differentiator. They have a dedicated Responsible AI Squad, over 500 trained data stewards, and thousands trained in prompt engineering. Their responsible AI framework includes a data enablement plan process and "purple teaming" (combining adversarial and defensive testing) with team members across the organization. The platform implements configurable guardrails at multiple layers. For streaming responses, output-side guardrails are challenging to implement, so they leverage model-level guardrails (e.g., Azure's content filtering) where configurable severity and category settings can be specified. For RAG-based use cases, guardrails are implemented by constraining the LLM to respond based on provided context rather than generating content independently. Telus received the Outstanding Organization 2023 prize from the Responsible AI Institute, and their Telus.com support use case achieved Privacy by Design ISO certification. ## Performance Optimization For performance, the platform employs asynchronous processing where possible. For example, chat naming (where an LLM generates a title for a conversation) is done asynchronously after the main response is delivered. Streaming is used to provide perceived responsiveness even when underlying LLM calls are slow. The multi-model proxy layer also enables optimization through load balancing and fallback to alternative models when one is slow or unavailable. ## Monitoring and Evaluation Monitoring spans multiple cloud platforms (GCP, AWS, Azure) with customizable solutions for different organizational needs. Key metrics include response time, cost per query, and accuracy evaluation. For accuracy assessment, they maintain databases with "source of truth" answers and use LLMs to compare and score responses - a form of LLM-as-judge evaluation. ## Considerations and Limitations While the presentation is largely promotional (the team is actively marketing FuelEx), several practical limitations and honest assessments emerge. The presenters acknowledge that guardrails on streaming output remain challenging without model-level support. They note that LLM response times are fundamentally limited - "if it's a slow LLM there's not a lot that you can do." The platform is not open source, though they emphasize extensibility through APIs for organizations to connect their own applications and data sources. The multi-language support relies on the underlying LLM capabilities (specifically citing GPT-4's multilingual abilities) rather than platform-specific handling, and the embedding-based retrieval should work across languages due to semantic similarity matching. This case study represents a mature enterprise LLMOps implementation with approximately 18 months of production experience, significant scale (35,000+ users, 30,000+ co-pilots), and a thoughtful multi-model, multi-cloud architecture designed for flexibility and Canadian regulatory compliance.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source