ZenML

AI Agents for Travel Booking and Customer Service Automation

TPConnects 2025
View original source

TPConnects, a software solutions provider for airlines and travel sellers, transformed their legacy travel booking APIs and UI into a production-ready AI agent system built on Amazon Bedrock. The company implemented a supervised multi-agent orchestration architecture that handles the complete travel journey from shopping and booking to order management and customer servicing. Key challenges included managing latency with large API responses (2000+ flight offers), orchestrating multiple APIs in a pipeline, handling industry-specific IATA codes, and ensuring JSON formatting consistency. The solution uses Claude 3.5 Sonnet as the primary model, incorporates prompt engineering and knowledge bases for travel domain expertise, and extends beyond traditional chat to WhatsApp Business API integration for proactive disruption management and upselling. The system took 3-4 months to develop with AWS support and represents a shift from manual UI interactions to conversational AI-driven travel experiences.

Industry

Other

Technologies

Overview and Business Context

TPConnects is a pioneering software product and solutions provider serving the airline and travel seller industry. The company undertook a significant transformation initiative to convert their legacy UI solutions and APIs into a modern AI agent-based system powered by Amazon Bedrock. This case study, presented by Pravin Kumar (CTO and co-founder of TPConnects) alongside AWS representatives, illustrates the journey from proof of concept to production deployment in the travel booking domain.

The presentation frames the broader industry context, noting that the generative AI landscape has evolved from proof-of-concept building (two years ago) through production deployment efforts (12-18 months ago) to the current focus on value generation through AI agents. The travel industry presents particular challenges for LLM deployment including high-volume data responses, multiple API orchestrations, industry-specific terminology (IATA codes), and latency sensitivity that are critical for customer experience.

Architecture and Multi-Agent Orchestration

The core of TPConnects’ solution is what they call the “Trip Captain” orchestration engine, which implements a supervised multi-agent architecture. The system employs a primary supervised agent that controls multiple specialized sub-agents beneath it. The architecture includes distinct agents for different functional domains:

The supervised agent receives user input and intelligently routes requests to the appropriate sub-agents based on the conversational context. For example, a user beginning a travel search would be directed to the shopping agent, which upon completion would hand off to the order agent for booking confirmation. This hierarchical agent structure allows for separation of concerns while maintaining conversational continuity across different phases of the travel booking journey.

Foundation Model Selection and Reasoning

TPConnects selected Claude 3.5 Sonnet as their primary foundation model, specifically noting it as “one of the stable model in giving a reasonable textual response based upon the user input.” This choice was made through Amazon Bedrock, which the presenters emphasized provides access to multiple foundation models including Anthropic’s Claude family, open-source models like Llama and Mistral, DeepSeek, and specialized models like Cohere for RAG applications, as well as AWS’s own Nova model family.

The model selection framework presented emphasizes balancing three key factors: cost, latency/speed, and intelligence. For travel applications where customer experience is paramount, the choice of Claude 3.5 Sonnet suggests prioritization of response quality and reasoning capability over pure cost optimization, though the presenters note that Amazon Nova models are used for AWS’s internal customer support applications as a more cost-effective option.

Prompt Engineering and Storage

A critical component emphasized throughout the presentation is prompt engineering, which TPConnects identifies as “the key thing” that ensures the LLM behaves exactly as required for production use. The system implements a dedicated prompt storage component that maintains specific prompts for each agent in the multi-agent architecture. This centralized prompt management allows for:

The prompt engineering work appears to have been substantial, involving detailed instructions for how agents should interact with customers, when to call specific APIs, how to handle ambiguous requests, and how to maintain context across multi-turn conversations. The presenters emphasize that without proper prompt engineering, achieving production-ready behavior would not be possible, as the system must react “exactly the same what it has been defined for.”

Knowledge Base and Domain-Specific RAG

The travel industry presents unique challenges in terminology, with extensive use of acronyms and codes (IATA airport codes, airline codes, aircraft types, fare classes, etc.). TPConnects addressed this through a comprehensive knowledge base integrated into their Bedrock deployment. Specific examples mentioned include:

The knowledge base plays a “key role” in helping the LLM understand domain-specific language that would not be present in the base model’s training data. The implementation appears to use retrieval-augmented generation (RAG) where relevant knowledge is retrieved and injected into the prompt context when needed. The presenters note that building RAG “at scale” with hundreds or thousands of documents can be “super challenging,” though the specific retrieval mechanisms (embedding models, vector databases, chunking strategies) are not detailed in the transcript.

Chain of Thought and Parameter Collection

A particularly interesting production design pattern mentioned is the use of “chain of thoughts” reasoning for API parameter collection. In travel booking, APIs require multiple parameters before execution (travel dates, origin, destination, passenger counts, cabin class, etc.). Rather than making premature API calls with incomplete information, the chain-of-thought implementation:

This approach prevents failed API calls, reduces unnecessary latency from repeated calls, and creates a more natural conversational flow. The chain-of-thought mechanism “interact with the customer and get all the required input before it process it with to the API or the LLMs,” representing a deliberate production optimization that balances user experience with system efficiency.

Function Calling and API Integration

The system uses function calling (action groups in Bedrock terminology) to integrate with TPConnects’ existing travel APIs. Rather than using Bedrock’s built-in agent action invocation, the team implemented a “return of control” pattern using the boto3 SDK. This architecture choice provides several benefits:

The function calling implementation receives invocation input including the prompt, knowledge base context, and conversation history to generate appropriate follow-up messages. The system handles multiple orchestrated API calls in a pipeline for complex operations like order creation, which may require sequential calls to availability, pricing, booking, and payment APIs.

Latency Optimization and Response Chunking

Latency emerged as the “main issue” faced during production deployment. Travel searches can return massive datasets—specifically mentioned are searches like Dubai to London that might return “2,000 plus offers from the different airline fly in between.” Passing such large datasets to an LLM would create unacceptable latency and potentially exceed context window limits.

TPConnects’ solution involves several strategies:

This approach transforms what could be a 10-20 second wait for processing 2000 offers into a responsive conversational experience where initial results appear quickly and can be refined through natural language. The system maintains a pool of relevant offers behind the scenes based on the ongoing conversation, bringing forward progressively refined options as the customer’s requirements become clearer.

Session Management and Conversation History

The architecture includes dedicated components for session management and chat history storage. These components are essential for maintaining context across multi-turn conversations, especially in complex booking flows that may span multiple interactions. The system feeds “the old session plus the conversational history to the engine to give you the required response” without which proper contextual responses would not be possible.

The conversation history allows for natural references like “I want the morning flight” or “add baggage to that option” where “that” refers to previously discussed offers. This stateful conversation management is critical for production usability, as users expect the system to remember what was just discussed rather than requiring complete re-specification of requirements with each turn.

Text-to-SQL Integration

An interesting production capability mentioned is the integration of text-to-SQL functionality for order retrieval. The order retrieval agent connects to a MySQL backend and can translate natural language queries into SQL to fetch booking history and details. This allows customers to make requests like “show me my upcoming trips” or “what was my booking reference for the London flight” without navigating traditional UI menus.

The text-to-SQL integration represents a challenging LLMOps scenario as it requires:

While the implementation details aren’t fully specified, this capability demonstrates integration of structured data retrieval within the conversational agent framework.

Rich Media and User Experience Beyond Text

A notable aspect emphasized in the demonstration is that “this is not just an another chat engine”—the system provides a “rich content experience” beyond simple text chat. The UI includes visual elements like flight cards, pricing displays, itinerary details, and interactive selection mechanisms. This hybrid approach combines conversational AI for search and refinement with traditional visual UI elements for information presentation and final selection.

The design philosophy here recognizes that while conversational interfaces excel at filtering, refining, and navigating complex option spaces, visual presentation remains superior for comparing detailed information and making final decisions. The system provides “a full future interaction with the LLM” where customers can “ask for your filter” or “ask for the option” in natural language but see results in visually rich formats.

WhatsApp Business API Integration

Beyond the web interface, TPConnects extended their agent system to WhatsApp Business API, creating a particularly innovative production use case. The integration leverages passenger contact information collected during booking to enable proactive engagement:

This multi-channel approach demonstrates mature LLMOps thinking where the same agent backend serves multiple interaction modalities. The WhatsApp integration particularly addresses customer convenience—“it become very easy for the customer to look at the WhatsApp and see okay what’s the real disruption happened rather than calling a travel agent or going to the website.”

Voice and Kiosk Future Developments

The presentation mentions work in progress on voice-enabled kiosk systems using Amazon Nova Sonic for voice input/output. This would allow travelers to interact with the booking system through speech at physical travel agency locations, transforming “brick and mortar travel agency” experiences. The vision is that “anybody who have running a travel brick and motor travel agency could actually make an iPod into a speaking with the travel agents.”

This represents a forward-looking LLMOps architecture where the core agent orchestration and business logic remain consistent while new interaction modalities (text chat, WhatsApp, voice) are added through different frontends. The multimodal approach—text, rich media, voice—demonstrates thinking about LLM production systems as backend reasoning engines that can serve diverse user interfaces.

Production Challenges and Solutions

The presentation candidly discusses several production challenges encountered:

JSON Formatting: Ensuring consistent, parseable JSON responses from the LLM for downstream processing required careful prompt engineering and validation layers.

IATA Codes and Domain Language: The extensive use of industry codes and abbreviations required building comprehensive knowledge bases and training/fine-tuning approaches to ensure proper understanding.

Multiple API Orchestration: Complex transactions like order creation require sequential API calls with dependency management, error handling, and rollback capabilities that needed to be orchestrated behind the conversational interface.

Latency at Scale: As discussed above, handling high-volume search results required sophisticated chunking and personalization strategies.

These challenges and their solutions represent the practical reality of moving from proof-of-concept to production. The team spent “3 four months” working closely with AWS support to overcome these hurdles, suggesting that even with managed services like Bedrock, production deployment of complex agent systems requires significant engineering effort and domain expertise.

AWS Bedrock Platform Capabilities

While this is an AWS-sponsored presentation and should be viewed with appropriate skepticism regarding vendor claims, the case study does illustrate several Bedrock capabilities being used in production:

The platform approach allowed TPConnects to focus on business logic, prompt engineering, and user experience rather than infrastructure management for model serving, vector databases, and orchestration frameworks. However, the 3-4 month development timeline and extensive AWS support requirement suggests that even managed platforms require significant expertise to deploy successfully.

Value Proposition and Business Outcomes

The stated value propositions for the production system include:

Specific quantitative metrics (cost savings, booking conversion rates, customer satisfaction scores, agent call volume reduction, etc.) are not provided in the presentation, which is a common limitation in vendor case studies. The focus remains on capability demonstration rather than measurable business impact.

Critical Assessment and LLMOps Maturity

From an LLMOps perspective, this case study demonstrates several markers of production maturity:

However, several LLMOps aspects receive limited or no coverage:

The 3-4 month development timeline with significant vendor support suggests this was a focused deployment effort rather than a fully mature MLOps practice. The emphasis on building features and capabilities rather than operational excellence indicators may reflect the current maturity stage of the implementation.

Industry Context and Applicability

The travel industry presents an interesting test case for production LLM deployment because it combines:

The patterns demonstrated—multi-agent orchestration, knowledge base integration, latency optimization through chunking, multi-channel deployment—are broadly applicable to other industries with similar characteristics (financial services, healthcare, retail, telecommunications). The supervised agent pattern in particular offers a reusable architecture for complex business processes that span multiple functional domains.

The case study reinforces that successful production LLM deployment requires deep domain expertise (travel industry knowledge), platform engineering capabilities (API integration, orchestration), and AI-specific skills (prompt engineering, RAG, agent design). The close collaboration with AWS support suggests that even with managed platforms, organizations need significant guidance to navigate production deployment challenges.

More Like This

Scaling Customer Support, Compliance, and Developer Productivity with Gen AI

Coinbase 2025

Coinbase, a cryptocurrency exchange serving millions of users across 100+ countries, faced challenges scaling customer support amid volatile market conditions, managing complex compliance investigations, and improving developer productivity. They built a comprehensive Gen AI platform integrating multiple LLMs through standardized interfaces (OpenAI API, Model Context Protocol) on AWS Bedrock to address these challenges. Their solution includes AI-powered chatbots handling 65% of customer contacts automatically (saving ~5 million employee hours annually), compliance investigation tools that synthesize data from multiple sources to accelerate case resolution, and developer productivity tools where 40% of daily code is now AI-generated or influenced. The implementation uses a multi-layered agentic architecture with RAG, guardrails, memory systems, and human-in-the-loop workflows, resulting in significant cost savings, faster resolution times, and improved quality across all three domains.

customer_support regulatory_compliance fraud_detection +50

Building AI-Native Platforms: Agentic Systems, Infrastructure Evolution, and Production LLM Deployment

Delphi / Seam AI / APIsec 2025

This panel discussion features three AI-native companies—Delphi (personal AI profiles), Seam AI (sales/marketing automation agents), and APIsec (API security testing)—discussing their journeys building production LLM systems over three years. The companies address infrastructure evolution from single-shot prompting to fully agentic systems, the shift toward serverless and scalable architectures, managing costs at scale (including burning through a trillion OpenAI tokens), balancing deterministic workflows with model autonomy, and measuring ROI through outcome-based metrics rather than traditional productivity gains. Key technical themes include moving away from opinionated architectures to let models reason autonomously, implementing state machines for high-confidence decisions, using tools like Pydantic AI and Logfire for instrumentation, and leveraging Pinecone for vector search at scale.

chatbot content_moderation customer_support +40

Customer Service Transformation with AI-Based Email Automation and Chatbot Implementation

Sixt 2025

Sixt, a mobility service provider with over €4 billion in revenue, transformed their customer service operations using generative AI to handle the complexity of multiple product lines across 100+ countries. The company implemented "Project AIR" (AI-based Replies) to automate email classification, generate response proposals, and deploy chatbots across multiple channels. Within five months of ideation, they moved from proof-of-concept to production, achieving over 90% classification accuracy using Amazon Bedrock with Anthropic Claude models (up from 70% with out-of-the-box solutions), while reducing classification costs by 70%. The solution now handles customer inquiries in multiple languages, integrates with backend reservation systems, and has expanded from email automation to messaging and chatbot services deployed across all corporate countries by Q1 2025.

customer_support chatbot classification +31