ZenML

Building an Internal ChatGPT-like Tool for Enterprise-wide AI Access

Grab 2025
View original source

Grab's ML Platform team faced overwhelming support channel inquiries that consumed engineering time with repetitive questions. An engineer initially attempted to build a RAG-based chatbot for platform documentation but encountered context window limitations with GPT-3.5-turbo and scalability issues. Pivoting from this failed experiment, the engineer built GrabGPT, an internal ChatGPT-like tool accessible to all employees, deployed over a weekend using existing frameworks and Grab's model-serving platform. The tool rapidly scaled to nearly company-wide adoption, with over 3000 users within three months and 600 daily active users, providing secure, auditable, and globally accessible LLM capabilities across multiple model providers including OpenAI, Claude, and Gemini.

Industry

Tech

Technologies

Overview

GrabGPT represents an internal LLM deployment case study from Grab, a Southeast Asian superapp company operating across mobility, deliveries, and digital financial services. The project began in March 2023 as an attempt to reduce support burden on the ML Platform team but evolved into a company-wide internal ChatGPT alternative. This case study is particularly interesting from an LLMOps perspective because it demonstrates the rapid evolution from a failed RAG implementation to a successful general-purpose internal tool, highlighting pragmatic decision-making, deployment infrastructure reuse, and the importance of non-technical factors like security and accessibility in enterprise LLM adoption.

The narrative, authored by engineer Wenbo Wei and published in May 2025, provides a first-person account of the development process, though as with any self-reported case study, claims about adoption rates and impact should be considered within the context of internal promotion of the tool. The case study offers valuable insights into the operational considerations of deploying LLMs at enterprise scale, particularly around authentication, model agnosticism, auditability, and data security requirements.

Initial Problem and First Attempt

The original motivation stemmed from a common enterprise pain point: the ML Platform team at Grab was experiencing high volumes of repetitive user inquiries through Slack channels, consuming significant on-call engineering time. The initial hypothesis was that a specialized chatbot could be built to automatically answer questions based on the platform’s documentation, potentially using retrieval-augmented generation (RAG) approaches.

The engineer explored open-source frameworks and discovered chatbot-ui, which could be integrated with LLMs. The approach involved attempting to feed over 20,000 words of platform Q&A documentation to the system. However, this immediately encountered a fundamental LLMOps challenge: GPT-3.5-turbo’s context window limitation of 8,000 tokens (approximately 2,000 words). To work within these constraints, the engineer spent considerable time summarizing the documentation down to less than 800 words, a reduction of over 95% of the original content.

This first attempt reveals several LLMOps lessons. The engineer tried multiple technical approaches: first, the heavily summarized documentation approach, which worked only for a handful of frequently asked questions; and second, embedding search (presumably a RAG approach), which “didn’t work that well too.” The case study doesn’t provide detailed technical reasons for the RAG failure, which limits our ability to assess whether this was due to embedding quality, retrieval precision issues, document chunking strategies, or prompt engineering challenges. The lack of detail here is somewhat notable—successful RAG implementations for documentation Q&A were certainly possible in early 2023, suggesting either implementation challenges, insufficient experimentation time, or architectural mismatches with the specific use case.

Importantly, the engineer made the pragmatic decision to abandon this specialized use case rather than continuing to iterate on the RAG approach. This pivot decision, while potentially premature from a pure technical perspective, proved strategically valuable.

The Pivot: Building GrabGPT

The key insight that led to GrabGPT was recognizing an organizational gap: Grab lacked an internal ChatGPT-like tool, and the engineer already had accumulated the necessary knowledge (LLM framework familiarity) and infrastructure access (Grab’s model-serving platform called Catwalk). This pivot represents a common pattern in successful internal tools—building general-purpose infrastructure rather than overfitting to a specific narrow use case.

From an LLMOps deployment perspective, the implementation timeline was remarkably rapid: the initial version was built and deployed over a single weekend. The core components included:

This rapid deployment was enabled by reusing existing infrastructure and frameworks rather than building from scratch. The engineer didn’t need to solve model hosting, scaling, or authentication infrastructure—these were already available. This highlights an important LLMOps principle: the availability of mature internal ML infrastructure can dramatically accelerate LLM application deployment.

Adoption and Scale

The adoption metrics presented in the case study are impressive, though as self-reported figures they should be interpreted with appropriate context:

These figures suggest rapid viral adoption within the organization. The 600 daily active users out of 3,000 total users at month 3 represents a 20% DAU/MAU ratio, which is reasonably healthy for an internal tool. However, the claim that “almost all Grabbers” are using it by May 2025 (roughly two years after launch) is difficult to assess without knowing Grab’s total employee count or having more specific metrics.

From an LLMOps perspective, this adoption trajectory would have created significant operational challenges around scaling, cost management, and infrastructure reliability. The case study unfortunately doesn’t discuss how the team handled:

These omissions are notable because they represent some of the most critical LLMOps challenges when moving from prototype to production at scale.

Key Technical and Operational Features

The case study identifies several features that contributed to GrabGPT’s success, which provide insight into enterprise LLMOps requirements:

Data Security and Privacy: GrabGPT operates on a “private route,” ensuring that company data doesn’t leave Grab’s infrastructure. This is presented as a key differentiator from using public ChatGPT, where data would be sent to OpenAI’s servers and potentially used for training. For an organization handling sensitive transportation, delivery, and financial services data, this is a critical requirement. From an LLMOps perspective, this likely means that all model inference happens either on-premises or within Grab’s cloud infrastructure with strict network isolation.

However, the case study doesn’t clarify an important technical detail: if GrabGPT supports models from OpenAI, Claude, and Gemini (as claimed), how is the “private route” maintained? There are several possible architectures:

The lack of clarity here is significant because it affects the interpretation of the security claims. True data isolation would require self-hosted models, which would dramatically increase the operational complexity and cost.

Global Accessibility: The case study emphasizes that unlike ChatGPT, which is banned or restricted in some regions (specifically mentioning China), GrabGPT is accessible to all Grab employees regardless of location. This addresses a real challenge for multinational companies, particularly those operating in Southeast Asia where Grab is headquartered. From an LLMOps perspective, this requires infrastructure deployment across multiple regions, potentially including China, with appropriate networking and compliance considerations.

Model Agnosticism: GrabGPT supports multiple LLM providers including OpenAI, Claude, and Gemini. This is an important architectural decision from an LLMOps perspective, as it provides:

However, maintaining a multi-model architecture introduces operational complexity:

The case study doesn’t discuss how these tradeoffs were managed or whether there’s intelligent routing to automatically select appropriate models for different query types.

Auditability: Every interaction on GrabGPT is auditable, which is highlighted as important for data security and governance teams. This is a critical enterprise LLMOps requirement that often distinguishes internal tools from consumer applications. Auditability enables:

From an implementation perspective, this likely requires:

The case study doesn’t discuss the technical implementation of auditability or how privacy is balanced with logging requirements (for example, whether sensitive information in prompts is redacted or encrypted).

LLMOps Lessons and Critical Assessment

The case study presents several lessons learned, which are worth examining from an LLMOps perspective:

Failure as a stepping stone: The narrative frames the failed RAG chatbot as essential groundwork for GrabGPT’s success. While this makes for a compelling story, it’s worth noting that the initial failure and pivot happened quite quickly—the engineer spent “days” on summarization before abandoning the approach. From a technical perspective, more persistence on the RAG approach might have yielded success given that documentation Q&A was (and is) a solved problem with proper architecture. However, the organizational impact of building a general-purpose tool was arguably much higher than solving the narrow support channel problem, so the pivot was strategically correct even if technically premature.

Timing matters: The case study emphasizes that GrabGPT succeeded because it “addressed a critical need at the right time” (March 2023, shortly after ChatGPT’s public release in November 2022). This is accurate—there was a brief window where internal ChatGPT alternatives provided significant value before enterprise offerings became widely available. However, this also raises questions about the long-term sustainability and differentiation of GrabGPT as enterprise LLM products mature.

Start small, scale up: The weekend prototype to company-wide tool trajectory is presented as a success story, but from an LLMOps maturity perspective, this rapid scaling likely created technical debt. Production-grade LLM systems typically require:

The case study doesn’t discuss whether these production requirements were addressed proactively or reactively as issues emerged. The rapid adoption likely meant that many of these operational aspects were built while the system was already critical to many users’ workflows, which can be stressful and risky.

Missing Technical Details and Open Questions

Several important LLMOps aspects are not addressed in the case study:

Cost Management: With 600 daily active users and potential access to expensive models like GPT-4, Claude, and Gemini, the operational costs could be substantial. The case study doesn’t discuss:

Performance and Reliability: No discussion of:

Content Safety and Compliance: No mention of:

Model Selection and Configuration: Limited information about:

User Experience and Interface: Minimal details about:

Evolution and Improvement: The case study covers the launch but doesn’t discuss:

Broader Impact and Strategic Considerations

The case study claims that GrabGPT “sparked a broader conversation about how LLMs can be leveraged across Grab” and demonstrates that “a single engineer, provided with the right tools and timing, can create something transformative.” These are significant organizational claims that go beyond the technical implementation.

From an LLMOps strategy perspective, GrabGPT’s success likely had several organizational effects:

However, there are also potential strategic concerns:

Conclusion and LLMOps Takeaways

GrabGPT represents a pragmatic and successful internal LLM deployment that addressed real organizational needs around security, accessibility, and multi-model support. The case study is most valuable for demonstrating the importance of non-technical factors in enterprise LLM adoption, the value of reusing existing infrastructure, and the organizational impact of democratizing AI access.

However, the case study is notably light on technical implementation details, operational challenges, and the evolution of the system over its two-year lifetime. Many critical LLMOps concerns—cost management, performance optimization, content safety, and production reliability—are either glossed over or not mentioned at all. This may reflect the blog post’s audience (general technical readers rather than LLMOps practitioners) or the author’s focus on the strategic narrative rather than operational details.

For practitioners, the key takeaways are: the importance of leveraging existing infrastructure for rapid deployment; the value of model agnosticism in enterprise settings; the critical nature of security, auditability, and compliance features for enterprise adoption; and the sometimes-underestimated organizational importance of accessibility (geographic and otherwise). The rapid adoption also demonstrates that when security and access barriers are removed, demand for LLM capabilities within enterprises can be substantial and immediate.

The case study also serves as a reminder to approach vendor claims and success stories with appropriate skepticism. While the adoption metrics are impressive, many operational details that would be critical for assessing the full picture of success (costs, reliability, user satisfaction beyond usage numbers, support burden, etc.) are absent. Nonetheless, GrabGPT appears to represent a genuine success in internal LLM deployment, even if the case study leaves many LLMOps questions unanswered.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

AI Agents in Production: Multi-Enterprise Implementation Strategies

Canva / KPMG / Autodesk / Lightspeed 2026

This comprehensive case study examines how multiple enterprises (Autodesk, KPMG, Canva, and Lightspeed) are deploying AI agents in production to transform their go-to-market operations. The companies faced challenges around scaling AI from proof-of-concept to production, managing agent quality and accuracy, and driving adoption across diverse teams. Using the Relevance AI platform, these organizations built multi-agent systems for use cases including personalized marketing automation, customer outreach, account research, data enrichment, and sales enablement. Results include significant time savings (tasks taking hours reduced to minutes), improved pipeline generation, increased engagement rates, faster customer onboarding, and the successful scaling of AI agents across multiple departments while maintaining data security and compliance standards.

customer_support data_cleaning content_moderation +36

Building an Internal ChatGPT for Enterprise: From Failed Support Bot to Company-Wide AI Tool

Grab 2025

Grab's ML Platform team was overwhelmed with support inquiries in Slack channels, prompting an engineer to experiment with building an LLM-powered chatbot for platform documentation. After the initial attempt failed due to token limitations and poor embedding search results, the project pivoted to creating GrabGPT—an internal ChatGPT-like tool for all employees. Deployed over a weekend with Google authentication and leveraging Grab's existing model-serving infrastructure (Catwalk), GrabGPT rapidly grew from 300 users on day one to becoming nearly universally adopted across the company, with over 3,000 users and 600 daily active users within three months. The success was attributed to data security controls, global accessibility (especially in regions where ChatGPT is blocked), model-agnostic architecture supporting multiple LLM providers, and full auditability for governance.

chatbot customer_support question_answering +11