ZenML

Company-Wide GenAI Transformation Through Hackathon-Driven Culture and Centralized Infrastructure

Agoda 2025
View original source

Agoda transformed from GenAI experiments to company-wide adoption through a strategic approach that began with a 2023 hackathon, grew into a grassroots culture of exploration, and was supported by robust infrastructure including a centralized GenAI proxy and internal chat platform. Starting with over 200 developers prototyping 40+ ideas, the initiative evolved into 200+ applications serving both internal productivity (73% employee adoption, 45% of tech support tickets automated) and customer-facing features, demonstrating how systematic enablement and community-driven innovation can scale GenAI across an entire organization.

Industry

E-commerce

Technologies

Agoda’s GenAI transformation represents one of the most comprehensive enterprise-wide LLMOps implementations documented, spanning from initial experimentation to production-scale deployment across 200+ applications. The journey began in February 2023 with a company-wide GPT hackathon that engaged over 200 developers across three days, resulting in 40+ prototyped ideas ranging from customer support bots to SEO content generators. This hackathon served as a catalyst, demonstrating that LLMs were capable of real production work beyond simple chat demonstrations, highlighting the critical importance of prompt engineering, and identifying internal context and data access as the primary technical challenges.

The most significant aspect of Agoda’s approach was their systematic building of infrastructure to support scalable GenAI adoption. At the heart of their LLMOps architecture is a centralized GenAI Proxy that serves as the single access point for all GenAI traffic within the organization. This proxy provides several critical production capabilities including centralized compliance and legal review processes for new use cases, intelligent routing between different GenAI providers (initially OpenAI and Azure OpenAI, later expanding to additional providers), granular usage tracking down to individual requests and tokens, automatic cost attribution back to teams, and comprehensive rate limiting and monitoring capabilities. This centralized approach enabled rapid experimentation while maintaining governance and cost control, addressing one of the most challenging aspects of enterprise GenAI deployment.

Complementing the proxy infrastructure, Agoda developed an internal Chat Assistant Platform built as a Progressive Web App that replaced reliance on external GenAI interfaces. This platform was designed with three key principles: GenAI-agnostic assistant behavior definition under internal control, avoidance of costly per-seat pricing models, and provision of a unified interface that could work across any model provider. The platform includes a comprehensive chat UI with integration to Agoda’s internal systems, the ability to create custom assistants with pluggable tool support for specialized tasks like SEO and data queries, a playground environment for prototyping prompts and function calls, and a Chrome extension that provides contextual assistance using full page content.

The cultural transformation was equally important to the technical infrastructure. The organic growth from the initial hackathon was sustained through community-driven learning spaces, particularly a Slack channel called “mastering-gpt” that became the nerve center for sharing techniques, announcing tools, and collaborative problem-solving. This grassroots momentum was supported by practical enablement measures including rollout of coding assistants like GitHub Copilot across all developer IDEs, and the deployment of Slack bot integrations that allowed employees to interact with GenAI directly within their existing workflows.

Agoda’s “Inside-Out” approach focused first on improving internal workflows to develop the necessary skills, patterns, and infrastructure for responsible GenAI deployment before extending to customer-facing applications. This methodology enabled teams to learn how to build GenAI-powered applications effectively, develop prompt mastery treating prompting as interface design, figure out appropriate testing strategies for non-deterministic GenAI features, and build infrastructure supporting rapid iteration and safe deployment without the high stakes of customer-facing environments.

The production applications demonstrate sophisticated LLMOps implementations across multiple domains. Internal productivity tools include Query Assist for natural language SQL generation, Document Processor for applying GenAI to structured data processing in spreadsheets, AskGoda for automated technical support handling 50% of incoming tickets using documentation and historical data, and Meeting Helper for automatic meeting summarization with proactive follow-up capabilities. Engineering acceleration tools include Jira Planning Assistants that enrich user stories with internal knowledge, autonomous coding agents for large-scale migrations with test validation, code review assistants providing consistent automated feedback, auto-documentation tools triggered by code changes, monitoring and log analysis agents integrated with anomaly detection systems, and custom Model Context Protocols exposing internal systems to coding assistants.

Customer-facing applications leverage the same robust infrastructure with Q&A assistants embedded across the platform for natural language support, review helpers for summarizing and filtering large volumes of guest reviews, and enhanced content curation through automated tagging and enrichment of unstructured content. The systematic approach to LLMOps enabled Agoda to achieve remarkable adoption metrics with 73% of employees using GenAI productivity tools and consistent growth from 119 applications in 2023 to 204 by mid-2025.

The technical architecture demonstrates several important LLMOps best practices including centralized access control and governance, comprehensive monitoring and cost attribution, multi-provider support and intelligent routing, secure internal deployment avoiding external dependencies, and systematic testing and evaluation frameworks. The emphasis on building reusable infrastructure and maintaining a community of practice around GenAI development enabled sustainable scaling that avoided the common pitfalls of isolated experimentation and ungoverned proliferation of GenAI applications.

Agoda’s approach to production GenAI deployment addresses critical challenges including the gap between prototypes and production-ready applications, the need for robust testing methodologies for non-deterministic systems, cost management and usage attribution at enterprise scale, governance and compliance requirements, and the cultural transformation required for organization-wide adoption. Their systematic documentation of lessons learned provides valuable insights for other organizations undertaking similar GenAI transformations, particularly the importance of infrastructure investment, community building, and the iterative development of expertise through internal use cases before expanding to customer-facing applications.

The case study demonstrates that successful enterprise GenAI deployment requires more than just access to LLM APIs, but rather a comprehensive approach encompassing technical infrastructure, governance frameworks, cultural transformation, and systematic capability building. Agoda’s journey from hackathon to production deployment at scale provides a detailed template for organizations seeking to implement GenAI across their operations while maintaining quality, security, and cost effectiveness.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Reinforcement Learning for Code Generation and Agent-Based Development Tools

Cursor 2025

This case study examines Cursor's implementation of reinforcement learning (RL) for training coding models and agents in production environments. The team discusses the unique challenges of applying RL to code generation compared to other domains like mathematics, including handling larger action spaces, multi-step tool calling processes, and developing reward signals that capture real-world usage patterns. They explore various technical approaches including test-based rewards, process reward models, and infrastructure optimizations for handling long context windows and high-throughput inference during RL training, while working toward more human-centric evaluation metrics beyond traditional test coverage.

code_generation code_interpretation data_analysis +61

Building a Multi-Agent Research System for Complex Information Tasks

Anthropic 2025

Anthropic developed a production multi-agent system for their Claude Research feature that uses multiple specialized AI agents working in parallel to conduct complex research tasks across web and enterprise sources. The system employs an orchestrator-worker architecture where a lead agent coordinates and delegates to specialized subagents that operate simultaneously, achieving 90.2% performance improvement over single-agent systems on internal evaluations. The implementation required sophisticated prompt engineering, robust evaluation frameworks, and careful production engineering to handle the stateful, non-deterministic nature of multi-agent interactions at scale.

question_answering document_processing data_analysis +48