Manchester Airports Group (MAG) implemented an agentic AI solution to automate unplanned absence reporting and shift management across their three UK airports handling over 1,000 flights daily. The problem involved complex, non-deterministic workflows requiring coordination across multiple systems, with different processes at each airport and high operational costs from overtime payments when staff couldn't make shifts. MAG built a multi-agent system using Amazon Bedrock Agent Core with both text-to-text and speech-to-speech interfaces, allowing employees to report absences conversationally while the system automatically authenticated users, classified absence types, updated HR and rostering systems, and notified relevant managers. The solution achieved 99% consistency in absence reporting (standardizing previously variable processes) and reduced recording time by 90%, with measurable cost reductions in overtime payments and third-party service fees.
Manchester Airports Group (MAG) operates as the UK’s largest airport group, managing three airports with approximately 9,000 direct staff (and 40,000 total employees operating on their campuses) and handling over 1,000 flights daily. The organization presented their agentic AI implementation at AWS re:Invent, marking their third consecutive year presenting AI solutions at the conference. This particular case study focuses on their “digital colleague workplace” vision—using agentic AI to manage complex operational processes across airport operations.
The specific use case centers on unplanned absence reporting for shift workers in critical airport functions, particularly security personnel. The business problem emerged from the daily operational complexity of managing unexpected staffing gaps in a 24/7 operation where passenger safety and security are non-negotiable priorities. When an employee cannot make their shift (due to illness, family emergency, or transportation issues), this triggers a cascade of activities: authenticating the absence, classifying it according to HR policies, updating multiple systems, notifying various managers, and potentially re-rostering replacement staff. Previously, this involved employees calling a third-party helpline, with resourcing teams manually coordinating changes across different systems, and line managers handling both administrative updates and pastoral care responsibilities.
The challenge was compounded by process variation across the three airports and different employee types, creating hundreds of workflow permutations. The business case for automation was compelling: reducing overtime costs from last-minute shift coverage, increasing passenger spending by reducing security queue times, and eliminating third-party service costs. However, MAG emphasized that this represented just the first step in a broader journey toward an intelligent, multi-agent airport management system.
MAG and AWS applied a framework of “think big, start small, scale fast” to their agentic AI implementation. The “think big” vision is the digital colleague workplace—ultimately an autonomous agentic system controlling multiple airport functions with minimal human oversight. However, they recognized the need to start with a focused use case that demonstrated measurable value while laying foundational infrastructure for future expansion.
The unplanned absence reporting use case was selected because it was both topical (particularly during peak travel periods like summer holidays and Christmas) and had clear ROI metrics around overtime costs, operational efficiency, and third-party service reduction. Critically, it also allowed MAG to address real operational complexity while navigating the stringent security, accuracy, and reliability requirements of operating critical national infrastructure subject to significant regulatory oversight.
The presentation included a thoughtful discussion of why agentic AI was chosen over simpler automation tools or basic generative AI assistants. The team outlined a spectrum of generative AI solutions with increasing autonomy:
The unplanned absence use case falls into the middle category, with the ultimate digital colleague workplace representing the fully autonomous vision. Several factors drove the decision toward agentic AI:
Non-deterministic complexity: With three airports, numerous job types, and different absence categories (each invoking different HR policies), the team quickly identified hundreds of workflow permutations. For example, childcare absences invoke different policies than illness absences. A traditional automation tool would require programming all these permutations and still wouldn’t capture edge cases, whereas agentic AI can dynamically reason through scenarios.
Extensibility and modularity: Agentic AI’s inherent modularity allows iterative addition of functionality by deploying new agents. MAG envisions expanding beyond absence reporting to fault reporting, asset management, and complex scenarios like coordinating staffing decisions when equipment is out of order and flights are delayed—requiring multiple agents to interact and optimize across interdependent functions.
Exception handling: The flexibility of agentic AI to handle incomplete or ambiguous input is crucial in real-world scenarios. When an employee reports an emergency using colloquial language or misses key information, agents can use natural language understanding to engage in multi-turn conversations until they have what they need. Traditional automation tools lack this understanding and rarely handle exceptions gracefully.
The team was clear that agentic AI isn’t appropriate for all use cases—predictable, deterministic workflows may be better served by automation tools—but for MAG’s complex, dynamic environment with high exception rates, it was the right choice.
The presentation provided detailed technical architecture insights, building the solution incrementally to illustrate design decisions. The progression moved from a simple automated workflow to a sophisticated multi-agent system with speech-to-speech capabilities.
Initial approach: The team started by mapping the current business workflow (employee contacts manager, manager verifies identity and checks policies, updates HR system, notifies personnel, handles rostering if applicable) to automated components: API calls to HR and rostering systems, knowledge base lookups for HR policies, LLM-based absence classification, and Amazon SNS for notifications.
First agentic implementation: A deterministic workflow quickly proved inadequate when employees provided incomplete information. This motivated implementing a true agentic approach where the system could engage in natural conversation to collect required information. The agent was built using the ReAct (Reason-Act-Observe) pattern, with each reasoning step logged to S3 for observability. The agent receives human input, reasons about what to do, acts using available tools, observes the results, and provides a response, potentially iterating through multiple cycles until the goal is achieved.
Tool design for agents: A critical early learning was that tools needed adaptation specifically for agent use. The team emphasized three principles:
For example, rather than having agents parse complex JSON responses, tools were modified to extract and format exactly the information needed for the next reasoning step.
Scaling with Agent Core Runtime: Once the text-based solution worked on a developer laptop, scaling became the next challenge. Supporting concurrent users required state management, websocket handling, and infrastructure complexity. MAG adopted Amazon Bedrock Agent Core Runtime, which encapsulates this complexity in a serverless solution. Agent Core Runtime provides scalability and security for deploying agents, with direct front-end connection and state persistence. Importantly, it’s compatible with open-source infrastructure and various frameworks—in this case, used with trans agents.
Microservices architecture with Model Context Protocol (MCP): To avoid a monolithic agent design that would become difficult to modify and extend, MAG separated concerns by hosting tools on an MCP server within Agent Core Gateway. This architectural decision provides:
When an employee submits a query, Agent Core Runtime connects to Agent Core Gateway to search for the correct tool, queries that tool, gets results, and answers the user’s question.
Context-aware guardrails: During security review, concerns emerged about potential social engineering attacks or malicious use. While the team implemented standard guardrails initially, they found these insufficient for sophisticated multi-turn attempts to manipulate the agent. This led to a second major learning: think broader than standard guardrails.
The team implemented a multi-layered security approach:
This was particularly important given that MAG operates critical national infrastructure with PII data and systems that directly impact employee pay—requiring compliance with significant regulatory requirements.
Speech-to-speech interface with Amazon Novasonic: Recognizing that many employees report absences while unable to type (e.g., stuck in traffic while driving to work), MAG implemented a speech-to-speech interface using Amazon Novasonic. This introduced real-time streaming complexity, as Novasonic expects continuous audio input and provides continuous audio output, requiring websocket connections rather than request-response patterns. This was managed by ECS, which buffers audio frames and coordinates bidirectional streaming.
The team encountered significant challenges prompting speech models, leading to a third key learning: read speech-to-speech model prompts out loud. Speech-to-speech models are trained on spoken language and have different characteristics than text models:
After rewriting prompts to be more speech-appropriate, the team faced another challenge: Novasonic needed to interact with the same tools as the text agent, but those tools produced system-oriented outputs rather than human-readable speech responses.
Agentic hierarchy: Rather than connecting Novasonic directly to tools and maintaining two separate sources of truth about tool usage, MAG implemented an agentic hierarchy. Amazon Novasonic calls the text-based agent as a tool, which in turn calls the actual system tools. This elegant solution:
Authentication for speech interface: Speech interfaces presented a unique authentication challenge, as users aren’t logging into an app before interacting with the system. MAG built a specialized authentication tool that collects relevant identifying information from the user conversationally, compares it against the HR system, and only allows access to the main functionality after successful authentication.
A critical learning emerged when demoing the solution to business stakeholders: UI is king. Even with an impressive backend capable of complex orchestration, users perceived the system as frozen or unresponsive during the time agents were working through multiple tool calls. The team emphasized that agentic systems inherently have latency due to the reasoning-action-observation loop across multiple tools, so keeping users engaged is essential.
The solution was to provide intuitive progress indicators showing what the agent is doing behind the scenes, without revealing too much technical infrastructure detail. The demo showed a UI that displays:
This approach manages user expectations and provides confidence that the system is actively working, rather than creating anxiety about whether anything is happening.
The final production architecture integrates multiple AWS services:
The architecture emphasizes observability, with all reasoning steps logged to S3, enabling analysis of agent behavior, debugging, and continuous improvement.
While the presentation acknowledged this is an AWS-sponsored case study (presented at re:Invent), specific quantifiable results were provided:
The presenters framed these as “first use case” results, emphasizing that the real value comes from the extensible foundation for the broader digital colleague workplace vision.
From an LLMOps perspective, several aspects of this case study merit balanced assessment:
Strengths of the approach:
Potential concerns and considerations:
LLMOps maturity indicators:
MAG’s vision extends well beyond absence reporting to a comprehensive digital colleague workplace encompassing:
The presentation positions this as a journey toward “the world’s most intelligent airport group,” with the current implementation representing foundational infrastructure that will support “a great number more tools” and “more agents with the right level of humans in the loop.”
The scalability of the architectural approach—particularly the MCP-based tool microservices and Agent Core Runtime—appears well-suited to this vision, though the complexity of coordinating multiple specialized agents for cross-functional optimization will likely surface new challenges around agent communication protocols, goal alignment, and system-level guardrails.
The presentation articulated four primary lessons learned, which represent valuable LLMOps insights:
1. Adapt tools for agents: Don’t assume that existing software tools will work optimally with agents. Verbose error messages, human-readable outputs, and combining sequential tools specifically for agent consumption significantly improve performance and reliability.
2. Write custom guardrails for multi-turn conversations: Standard guardrails that only examine the last message are insufficient for agentic systems. Context-aware guardrails that analyze conversation patterns across multiple turns are necessary to prevent sophisticated manipulation or social engineering.
3. Read speech-to-speech model prompts out loud: Speech models have fundamentally different characteristics than text models and require prompts written in natural spoken language patterns.
4. UI is king: No matter how sophisticated the backend, user experience determines success. Managing expectations during inevitable agentic latency through progress indicators is essential.
To these, we might add implicit lessons evident in the architecture:
Manchester Airports Group’s implementation represents a substantial real-world deployment of agentic AI in a complex, high-stakes operational environment. The technical architecture demonstrates thoughtful design decisions around scalability, security, and user experience. The incremental approach, starting with a focused use case while building extensible infrastructure, offers a practical model for other organizations considering agentic AI for complex operational processes.
However, as with any vendor-presented case study, some healthy skepticism about claimed results is warranted, and several important LLMOps maturity questions remain unanswered around evaluation, monitoring, and continuous improvement processes. The true test will be whether the foundation they’ve built successfully scales to the ambitious multi-agent digital colleague workplace vision, and whether the 99% consistency and 90% time reduction metrics hold up as the system handles increasing complexity and volume in production.
Yahoo! Finance built a production-scale financial question answering system using multi-agent architecture to address the information asymmetry between retail and institutional investors. The system leverages Amazon Bedrock Agent Core and employs a supervisor-subagent pattern where specialized agents handle structured data (stock prices, financials), unstructured data (SEC filings, news), and various APIs. The solution processes heterogeneous financial data from multiple sources, handles temporal complexities of fiscal years, and maintains context across sessions. Through a hybrid evaluation approach combining human and AI judges, the system achieves strong accuracy and coverage metrics while processing queries in 5-50 seconds at costs of 2-5 cents per query, demonstrating production viability at scale with support for 100+ concurrent users.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Trellix, in partnership with AWS, developed an AI-powered Security Operations Center (SOC) using agentic AI to address the challenge of overwhelming security alerts that human analysts cannot effectively process. The solution leverages AWS Bedrock with multiple models (Amazon Nova for classification, Claude Sonnet for analysis) to automatically investigate security alerts, correlate data across multiple sources, and provide detailed threat assessments. The system uses a multi-agent architecture where AI agents autonomously select tools, gather context from various security platforms, and generate comprehensive incident reports, significantly reducing the burden on human analysts while improving threat detection accuracy.