Grab: Evolution of an Internal AI Platform from No-Code LLM Apps to Agentic Systems

Company

Grab

Title

Evolution of an Internal AI Platform from No-Code LLM Apps to Agentic Systems

Industry

Tech

Link

https://engineering.grab.com/spellvault-evolution-beyond-llm

Year

2025

Summary (short)

Grab developed SpellVault, an internal no-code AI platform that evolved from a simple RAG-based LLM app builder into a sophisticated agentic system supporting thousands of apps across the organization. Initially designed to democratize AI access for non-technical users through knowledge integrations and plugins, the platform progressively incorporated advanced capabilities including workflow orchestration, ReAct agent execution, unified tool frameworks, and Model Context Protocol (MCP) compatibility. This evolution enabled SpellVault to transform from supporting static question-answering apps into powering dynamic AI agents capable of reasoning, acting, and interacting with internal and external systems, while maintaining its core mission of accessibility and ease of use.

## Overview Grab, a Southeast Asian superapp company operating across mobility, delivery, and financial services sectors, developed SpellVault as an internal AI platform to democratize access to large language models across their organization. The platform represents a comprehensive LLMOps case study spanning multiple years of evolution, from initial deployment as a simple no-code LLM app builder to a sophisticated agentic platform. The case study demonstrates how an organization can incrementally evolve an internal AI platform to keep pace with rapidly changing AI capabilities while maintaining backward compatibility and user accessibility. SpellVault's journey illustrates several critical LLMOps challenges: managing platform evolution without disrupting existing users, balancing ease of use with advanced capabilities, integrating with existing enterprise systems, and adapting to emerging industry standards. The platform successfully enabled the creation of thousands of AI applications used for productivity gains, automation, and production use cases across Grab's organization. This represents a significant scale deployment that moved beyond experimental AI usage into operationalized, business-critical applications. ## Initial Platform Architecture and Core Capabilities SpellVault was initially conceived as a no-code platform enabling employees without coding expertise to build and deploy LLM-powered applications. The founding vision centered on three fundamental pillars that would define the platform's LLMOps approach. The first pillar was a comprehensive RAG (Retrieval-Augmented Generation) solution with useful integrations. Rather than relying solely on the LLM's parametric knowledge, SpellVault prioritized grounding responses in factual, up-to-date information from various knowledge sources. The platform provided built-in integrations with common enterprise knowledge repositories including Wikis and Google Docs, alongside support for plain text and PDF uploads. This approach addressed a fundamental LLMOps challenge: ensuring that AI applications provide accurate, verifiable answers based on organizational knowledge rather than potentially hallucinated information from the base model alone. The emphasis on knowledge integration from day one reflects an understanding that production LLM applications require strong grounding in domain-specific information. The second pillar involved plugins designed to fetch information on demand, moving beyond static knowledge retrieval to enable dynamic interactions. These plugins functioned as modular components allowing apps to interact with both internal systems (such as service dashboards and incident trackers) and external APIs (like search engines and weather services). From an LLMOps perspective, this plugin architecture represented an early implementation of what would later be formalized as "tool calling" in the broader AI community. Users could create custom plugin instances from available plugin types with tailored settings and credentials, enabling specialized functionality. The HTTP plugin exemplifies this approach, allowing users to define custom endpoints and credentials so their AI apps could make tailored API calls at runtime. These custom plugins became foundational to many high-impact applications, demonstrating how extensibility mechanisms are critical for enterprise LLMOps platforms. The third pillar focused on multi-channel accessibility, ensuring SpellVault integrated seamlessly into existing workflows rather than existing as an isolated tool. The platform exposed three primary interfaces: a web interface for app configuration and testing, Slack integration for conversational access within team communications, and APIs enabling other internal services to programmatically interact with SpellVault apps. This multi-channel approach addresses a key LLMOps consideration: AI applications must meet users where they already work rather than requiring them to adopt entirely new tools and processes. The API exposure particularly enabled SpellVault to function as an AI capability layer that other production systems could leverage, transforming individual AI apps into organizational infrastructure components. ## Platform Evolution and Incremental Enhancement As the AI landscape evolved rapidly over subsequent years, SpellVault adopted a philosophy of continuous adaptation through incremental enhancements rather than disruptive overhauls. This approach enabled the platform to incorporate new capabilities while maintaining backward compatibility and preserving the user experience for thousands of existing applications. The platform progressively expanded its plugin ecosystem, adding integrations with tools like Slack and Kibana, continuously broadening the range of systems with which SpellVault apps could interact. From an LLMOps perspective, this plugin expansion represents the challenge of maintaining an integration layer as organizational tooling evolves. The team also implemented auto-updates for Knowledge Vaults, ensuring that the data underlying RAG applications remained current without requiring manual intervention from app creators. This addresses a critical production concern: knowledge freshness and synchronization between source systems and the AI platform's knowledge stores. As more users built applications and relied on SpellVault for business processes, trustworthiness became increasingly important. The team added citation capabilities to responses, enabling users to verify the sources of information provided by AI apps. This feature represents a fundamental LLMOps best practice for production systems: providing transparency and traceability for AI-generated outputs. Citations enable users to evaluate response quality and identify potential issues with knowledge retrieval or reasoning. Recognizing limitations in mathematical reasoning, the team developed a feature enabling LLMs to solve mathematical problems using Python runtime. This reflects an understanding that production LLM systems often require augmentation with specialized capabilities beyond text generation. The addition of Python execution represents an early form of code interpretation that would later become standard in many LLM platforms. User demand for automation led to the creation of a Task Scheduler feature allowing LLMs to schedule actions based on natural language input. This capability transforms SpellVault apps from interactive tools into autonomous agents capable of performing scheduled tasks, expanding the operational scope from human-in-the-loop applications to partially autonomous workflows. A significant milestone was the introduction of "Workflow," a drag-and-drop interface enabling users to design deterministic workflows combining various SpellVault ecosystem components including LLM calls, Python code execution, and Knowledge Vault lookups in predefined sequences. This represented a hybrid approach between fully agentic (LLM-directed) execution and deterministic (human-designed) orchestration, giving users explicit control over execution flow for use cases requiring predictability and compliance with specific processes. ## Architectural Transformation: From Linear Execution to Graph-Based Agents Beneath these visible feature additions, SpellVault underwent a fundamental architectural transformation in its execution model. The platform transitioned from a legacy executor system facilitating one-off information retrieval to an advanced graph-based executor supporting nodes, edges, and states with branching, looping, and modularity capabilities. This architectural shift had profound implications for LLMOps. The graph-based executor enabled SpellVault to transform all existing apps into ReAct (Reasoning and Acting) agents without disrupting their existing behavior. ReAct agents represent a significant advancement over simple prompt-response patterns, enabling iterative reasoning where the LLM can observe results from tool calls and decide on subsequent actions. This "one size fits many" solution significantly enhanced application capabilities, allowing them to leverage Knowledge Vaults and plugins in a more dynamic, agentic manner while seamlessly preserving backward compatibility. The decoupling of executor and prompt engineering components created architectural flexibility enabling multiple execution pathways. This modularity allowed the team to provide generic capabilities like "Deep Research" through simple UI checkboxes, while also supporting sophisticated custom workflows for high-ROI use cases like on-call alert analysis. The Deep Research capability leveraged SpellVault's ability to search across internal information repositories including Slack messages, Wikis, and Jira, as well as external online searches, orchestrating complex multi-step information gathering processes. From an LLMOps perspective, this architectural evolution demonstrates the value of building platforms with extensible execution models rather than hardcoding specific interaction patterns. The graph-based executor provided the foundation for increasingly sophisticated agentic behaviors while maintaining a consistent interface for app creators who might not understand or need to understand the underlying execution complexity. ## Consolidation into a Unified Tool Framework As SpellVault accumulated various capabilities like Python code execution and internal repository search, the team recognized an architectural inconsistency. Initially, these functionalities were integrated directly into the core PromptBuilder class and exposed to users primarily through simple checkboxes. However, as the platform evolved toward greater agency, the team realized these capabilities should be repositioned as "Tools" that LLMs could autonomously invoke when appropriate, similar to how ReAct agents already used user-created plugins. This recognition led to a significant refactoring consolidating scattered capabilities into a unified framework called "Native Tools." These Native Tools, along with existing user plugins (rebranded as "Community Built Tools"), formed a comprehensive collection of tools that LLMs could dynamically invoke at runtime. The distinction between these categories reflects important LLMOps considerations: Native Tools required no user-specific configuration (like performing internet searches), whereas Community Built Tools were custom, user-configured entities (like invoking specific HTTP endpoints) often requiring credentials or personalized settings. This consolidation represents a maturation of the platform's mental model, shifting from "LLM apps with features" to "AI agents with tools." From a user experience perspective, this makes the platform more intuitive: users are creating agents and defining what tools those agents can access, rather than toggling miscellaneous feature checkboxes. From a technical perspective, this unified abstraction simplifies the platform architecture, treating all external capabilities through a consistent tool-calling interface rather than special-casing different types of functionality. The tools framework also positions SpellVault to more easily incorporate future capabilities. Rather than requiring architectural changes to support new functionality types, new capabilities can simply be added as additional tools within the existing framework. This extensibility is crucial for LLMOps platforms operating in rapidly evolving technological landscapes. ## Model Context Protocol Integration and Ecosystem Interoperability Having streamlined internal capabilities into a unified tools framework, the SpellVault team turned attention to external interoperability by adopting the Model Context Protocol (MCP), an emerging industry standard for agent and client interaction. The team adapted SpellVault to function as an MCP service, bringing two key advancements from an LLMOps perspective. First, each app created in SpellVault can now be exposed through the MCP protocol, allowing other agents or MCP-compatible clients (such as IDEs or external orchestration frameworks) to treat a SpellVault app as a callable tool. This transforms SpellVault apps from isolated applications accessible only through the web interface or Slack into interoperable building blocks that other systems can invoke dynamically. From a production LLMOps standpoint, this dramatically expands the utility of SpellVault apps, enabling them to participate in larger multi-agent systems or complex workflows orchestrated by external systems. Second, the team extended MCP capabilities to Knowledge Vaults, allowing external clients to search, retrieve, and even add information to Vaults through the protocol. This effectively turns SpellVault's RAG pipeline into an MCP-native service, making contextual grounding available to agents beyond SpellVault itself. This is particularly significant because knowledge management and RAG infrastructure typically represents substantial investment, and making it available as a service multiplies its organizational value. The SpellVault team also contributed to the broader ecosystem by developing TinyMCP, a lightweight open-source Python library that adds MCP capabilities to existing FastAPI applications as just another router rather than requiring a separate application. This demonstrates how organizations building internal LLMOps platforms can simultaneously contribute to broader ecosystem tooling, potentially easing their own future integration challenges while supporting community development. The MCP integration represents a strategic evolution from SpellVault as a self-contained platform to SpellVault as an interoperable service provider in a broader agentic ecosystem. Users continue benefiting from no-code simplicity within SpellVault, but the outputs of their work—both apps and knowledge—become usable by other agents and tools outside the platform. This addresses a common LLMOps challenge: avoiding siloed AI capabilities that cannot easily compose with other organizational systems. ## LLMOps Considerations and Critical Assessment While the case study presents SpellVault's evolution positively, several LLMOps considerations warrant balanced assessment. The platform's success in enabling "thousands of apps" is mentioned without specific metrics on adoption quality, production usage rates, or business impact measurement. For LLMOps practitioners, understanding how many of these apps are actively used, which represent production systems versus experiments, and how business value is measured would provide important context for evaluating the platform's true operational success. The incremental evolution approach, while preserving backward compatibility, may have introduced technical debt or architectural complexity from supporting multiple execution models and interface patterns simultaneously. The case study doesn't discuss how the team managed versioning, deprecation, or migration of apps built on older paradigms to newer capabilities. In production LLMOps, managing platform evolution while supporting existing applications represents a significant ongoing challenge. The platform's emphasis on no-code accessibility is admirable, but the case study doesn't address how the team handles scenarios requiring sophisticated prompt engineering, complex error handling, or advanced customization beyond what the visual interface supports. Production LLM applications often require nuanced prompt optimization, retry logic, fallback strategies, and context management that may be challenging to expose through simplified interfaces. The balance between accessibility and capability represents a fundamental tension in enterprise LLMOps platforms. The integration of increasingly agentic behaviors raises important questions about reliability, predictability, and control that aren't thoroughly addressed. ReAct agents making autonomous decisions about tool invocation can behave unpredictably, potentially making unnecessary API calls, retrieving irrelevant information, or entering infinite loops. The case study doesn't discuss mechanisms for constraining agent behavior, implementing guardrails, monitoring agent actions, or handling failures in multi-step agentic workflows. These operational concerns are critical for production LLMOps. Security and access control considerations receive minimal attention in the case study. Community Built Tools requiring credentials and custom configurations raise questions about how SpellVault handles sensitive information, manages authentication to external systems, ensures appropriate access controls, and audits tool usage. The ability for users to create HTTP plugins that call arbitrary endpoints presents potential security risks if not properly governed. The citation capabilities address trustworthiness from an end-user verification perspective, but the case study doesn't discuss broader evaluation frameworks, quality assurance processes, or systematic testing approaches. Production LLMOps typically requires robust evaluation pipelines, regression testing for prompt changes, monitoring of response quality, and mechanisms for identifying and addressing degraded performance. How SpellVault enables or enforces these practices for the thousands of apps built on the platform remains unclear. Cost management receives no mention in the case study, yet represents a critical LLMOps concern for platforms supporting thousands of applications. Questions about LLM provider selection, cost allocation across apps and users, usage monitoring and throttling, and optimization of retrieval and tool-calling patterns to minimize unnecessary LLM invocations would be valuable for practitioners evaluating similar platforms. The MCP integration is presented as enabling broader interoperability, but the actual adoption and usage patterns aren't described. Whether other systems are actively leveraging SpellVault as an MCP service, what challenges arose in implementing the protocol, and how the team manages versioning and compatibility with evolving MCP standards would provide important context for organizations considering similar integration approaches. ## Strengths and Valuable Patterns Despite these considerations, SpellVault demonstrates several valuable patterns for enterprise LLMOps. The platform's consistent focus on democratization while incrementally adding sophisticated capabilities shows how organizations can progressively build internal AI competency without requiring everyone to become AI experts immediately. The multi-channel deployment approach (web, Slack, API) recognizes that AI capabilities are most valuable when integrated into existing workflows. The architectural evolution from hardcoded features to unified tool frameworks demonstrates valuable refactoring discipline, suggesting the team is managing technical debt and continuously improving platform architecture rather than simply accumulating features. The decision to adopt emerging standards like MCP relatively early shows strategic thinking about interoperability and avoiding proprietary lock-in. The platform's support for both deterministic workflows (through the Workflow interface) and agentic execution (through ReAct agents) recognizes that different use cases require different levels of control and autonomy. This flexibility is important for enterprise adoption where some use cases demand predictability and auditability while others benefit from flexible, adaptive agent behavior. ## Conclusion SpellVault represents a substantial internal investment in LLMOps infrastructure, evolving from a simple no-code LLM app builder into a sophisticated platform supporting diverse AI agent behaviors, extensive tool ecosystems, and interoperability with emerging standards. The platform's journey illustrates how enterprise LLMOps can evolve incrementally to keep pace with rapidly changing AI capabilities while maintaining accessibility for non-technical users. For organizations building similar internal platforms, SpellVault offers valuable lessons about prioritizing knowledge integration and tool extensibility from the start, designing execution architectures that can accommodate increasingly sophisticated agent behaviors, maintaining backward compatibility during evolution, and engaging with emerging standards to ensure long-term interoperability. However, the case study's limited discussion of operational concerns like evaluation, security, cost management, and reliability monitoring reminds us that successful LLMOps requires attention to these less glamorous but critical aspects of production AI systems. The true test of SpellVault's success will be sustained adoption of the thousands of apps built on the platform, successful migration to newer capabilities as the platform evolves, and measurable business impact from democratized AI access across Grab's organization. The architectural foundations and strategic decisions described suggest the platform is well-positioned for continued evolution as the agentic AI landscape matures.

Start deploying reproducible AI workflows today