Manchester Airports Group: Agentic AI for Automated Absence Reporting and Shift Management at Airport Operations

Overview and Business Context

Manchester Airports Group (MAG) operates as the UK’s largest airport group, managing three airports with approximately 9,000 direct staff (and 40,000 total employees operating on their campuses) and handling over 1,000 flights daily. The organization presented their agentic AI implementation at AWS re:Invent, marking their third consecutive year presenting AI solutions at the conference. This particular case study focuses on their “digital colleague workplace” vision—using agentic AI to manage complex operational processes across airport operations.

The specific use case centers on unplanned absence reporting for shift workers in critical airport functions, particularly security personnel. The business problem emerged from the daily operational complexity of managing unexpected staffing gaps in a 24/7 operation where passenger safety and security are non-negotiable priorities. When an employee cannot make their shift (due to illness, family emergency, or transportation issues), this triggers a cascade of activities: authenticating the absence, classifying it according to HR policies, updating multiple systems, notifying various managers, and potentially re-rostering replacement staff. Previously, this involved employees calling a third-party helpline, with resourcing teams manually coordinating changes across different systems, and line managers handling both administrative updates and pastoral care responsibilities.

The challenge was compounded by process variation across the three airports and different employee types, creating hundreds of workflow permutations. The business case for automation was compelling: reducing overtime costs from last-minute shift coverage, increasing passenger spending by reducing security queue times, and eliminating third-party service costs. However, MAG emphasized that this represented just the first step in a broader journey toward an intelligent, multi-agent airport management system.

Strategic Approach: Think Big, Start Small, Scale Fast

MAG and AWS applied a framework of “think big, start small, scale fast” to their agentic AI implementation. The “think big” vision is the digital colleague workplace—ultimately an autonomous agentic system controlling multiple airport functions with minimal human oversight. However, they recognized the need to start with a focused use case that demonstrated measurable value while laying foundational infrastructure for future expansion.

The unplanned absence reporting use case was selected because it was both topical (particularly during peak travel periods like summer holidays and Christmas) and had clear ROI metrics around overtime costs, operational efficiency, and third-party service reduction. Critically, it also allowed MAG to address real operational complexity while navigating the stringent security, accuracy, and reliability requirements of operating critical national infrastructure subject to significant regulatory oversight.

Why Agentic AI Rather Than Traditional Automation

The presentation included a thoughtful discussion of why agentic AI was chosen over simpler automation tools or basic generative AI assistants. The team outlined a spectrum of generative AI solutions with increasing autonomy:

Generative AI assistants (chatbots, document processors) help automate workflows but require humans to define and execute tasks
AI agents have greater autonomy, focused on achieving outcomes rather than completing specific tasks, able to adapt dynamically under uncertainty and coordinate across multiple systems
Fully autonomous agentic systems can independently set and execute goals with minimal human oversight

The unplanned absence use case falls into the middle category, with the ultimate digital colleague workplace representing the fully autonomous vision. Several factors drove the decision toward agentic AI:

Non-deterministic complexity: With three airports, numerous job types, and different absence categories (each invoking different HR policies), the team quickly identified hundreds of workflow permutations. For example, childcare absences invoke different policies than illness absences. A traditional automation tool would require programming all these permutations and still wouldn’t capture edge cases, whereas agentic AI can dynamically reason through scenarios.

Extensibility and modularity: Agentic AI’s inherent modularity allows iterative addition of functionality by deploying new agents. MAG envisions expanding beyond absence reporting to fault reporting, asset management, and complex scenarios like coordinating staffing decisions when equipment is out of order and flights are delayed—requiring multiple agents to interact and optimize across interdependent functions.

Exception handling: The flexibility of agentic AI to handle incomplete or ambiguous input is crucial in real-world scenarios. When an employee reports an emergency using colloquial language or misses key information, agents can use natural language understanding to engage in multi-turn conversations until they have what they need. Traditional automation tools lack this understanding and rarely handle exceptions gracefully.

The team was clear that agentic AI isn’t appropriate for all use cases—predictable, deterministic workflows may be better served by automation tools—but for MAG’s complex, dynamic environment with high exception rates, it was the right choice.

Technical Architecture: Building Block by Block

The presentation provided detailed technical architecture insights, building the solution incrementally to illustrate design decisions. The progression moved from a simple automated workflow to a sophisticated multi-agent system with speech-to-speech capabilities.

Initial approach: The team started by mapping the current business workflow (employee contacts manager, manager verifies identity and checks policies, updates HR system, notifies personnel, handles rostering if applicable) to automated components: API calls to HR and rostering systems, knowledge base lookups for HR policies, LLM-based absence classification, and Amazon SNS for notifications.

First agentic implementation: A deterministic workflow quickly proved inadequate when employees provided incomplete information. This motivated implementing a true agentic approach where the system could engage in natural conversation to collect required information. The agent was built using the ReAct (Reason-Act-Observe) pattern, with each reasoning step logged to S3 for observability. The agent receives human input, reasons about what to do, acts using available tools, observes the results, and provides a response, potentially iterating through multiple cycles until the goal is achieved.

Tool design for agents: A critical early learning was that tools needed adaptation specifically for agent use. The team emphasized three principles:

Verbose error catching: Agents, like humans, need clear error messages to find alternative paths when something fails
Human-readable outputs: While LLMs can parse various formats, providing cleaned, relevant information significantly improves speed and accuracy
Combining sequential tools: When tools are always called in sequence, combining them reduces latency significantly

For example, rather than having agents parse complex JSON responses, tools were modified to extract and format exactly the information needed for the next reasoning step.

Scaling with Agent Core Runtime: Once the text-based solution worked on a developer laptop, scaling became the next challenge. Supporting concurrent users required state management, websocket handling, and infrastructure complexity. MAG adopted Amazon Bedrock Agent Core Runtime, which encapsulates this complexity in a serverless solution. Agent Core Runtime provides scalability and security for deploying agents, with direct front-end connection and state persistence. Importantly, it’s compatible with open-source infrastructure and various frameworks—in this case, used with trans agents.

Microservices architecture with Model Context Protocol (MCP): To avoid a monolithic agent design that would become difficult to modify and extend, MAG separated concerns by hosting tools on an MCP server within Agent Core Gateway. This architectural decision provides:

Standard tool exposure: Tools are exposed in a standard way through MCP, making them discoverable
Reusability: The same tools can be used by different agents in different contexts
Extensibility: New tools can be added on the fly without redeploying the entire system
Single API call: Agent Core Gateway provides tool discovery and access through a unified interface

When an employee submits a query, Agent Core Runtime connects to Agent Core Gateway to search for the correct tool, queries that tool, gets results, and answers the user’s question.

Context-aware guardrails: During security review, concerns emerged about potential social engineering attacks or malicious use. While the team implemented standard guardrails initially, they found these insufficient for sophisticated multi-turn attempts to manipulate the agent. This led to a second major learning: think broader than standard guardrails.

The team implemented a multi-layered security approach:

Principle of least privilege: Each tool can only be used by authenticated users with appropriate access
Context-aware guardrails: Beyond standard Bedrock guardrails that only examine the last message, they built custom guardrails implemented via Lambda that analyze the full conversation history, using an LLM to detect unwanted conversation patterns across multiple turns

This was particularly important given that MAG operates critical national infrastructure with PII data and systems that directly impact employee pay—requiring compliance with significant regulatory requirements.

Speech-to-speech interface with Amazon Novasonic: Recognizing that many employees report absences while unable to type (e.g., stuck in traffic while driving to work), MAG implemented a speech-to-speech interface using Amazon Novasonic. This introduced real-time streaming complexity, as Novasonic expects continuous audio input and provides continuous audio output, requiring websocket connections rather than request-response patterns. This was managed by ECS, which buffers audio frames and coordinates bidirectional streaming.

The team encountered significant challenges prompting speech models, leading to a third key learning: read speech-to-speech model prompts out loud. Speech-to-speech models are trained on spoken language and have different characteristics than text models:

No chain of thought: Humans don’t verbalize their thinking process before speaking, so prompts shouldn’t include this
No enumeration patterns: While useful in text prompts to structure information, no one enumerates sentences when speaking naturally
Spelling and name handling: Models may struggle with unusual names or spellings, requiring specific guidance on handling edge cases

After rewriting prompts to be more speech-appropriate, the team faced another challenge: Novasonic needed to interact with the same tools as the text agent, but those tools produced system-oriented outputs rather than human-readable speech responses.

Agentic hierarchy: Rather than connecting Novasonic directly to tools and maintaining two separate sources of truth about tool usage, MAG implemented an agentic hierarchy. Amazon Novasonic calls the text-based agent as a tool, which in turn calls the actual system tools. This elegant solution:

Provides speech-appropriate responses automatically (since the text agent is already trained to produce human-readable outputs)
Maintains a single source of truth about tool usage and business logic in the text agent
Allows Novasonic to focus on conversation flow and knowing when to invoke the text agent

Authentication for speech interface: Speech interfaces presented a unique authentication challenge, as users aren’t logging into an app before interacting with the system. MAG built a specialized authentication tool that collects relevant identifying information from the user conversationally, compares it against the HR system, and only allows access to the main functionality after successful authentication.

User Experience and Latency Management

A critical learning emerged when demoing the solution to business stakeholders: UI is king. Even with an impressive backend capable of complex orchestration, users perceived the system as frozen or unresponsive during the time agents were working through multiple tool calls. The team emphasized that agentic systems inherently have latency due to the reasoning-action-observation loop across multiple tools, so keeping users engaged is essential.

The solution was to provide intuitive progress indicators showing what the agent is doing behind the scenes, without revealing too much technical infrastructure detail. The demo showed a UI that displays:

“Retrieving user information”
“Classifying absence type”
“Need more information” (when data is incomplete, triggering follow-up questions)
“Checking calendar and recording absence”
“Notifying managers and rostering team”
Final confirmation of all actions taken

This approach manages user expectations and provides confidence that the system is actively working, rather than creating anxiety about whether anything is happening.

Production Architecture and Observability

The final production architecture integrates multiple AWS services:

Front-end interfaces: Both web app and speech interfaces (with future phone line integration planned)
Agent Core Runtime: Hosts the agents with serverless scaling and state management
Agent Core Gateway: Provides tool discovery and access via MCP
MCP Server: Hosts individual tools as microservices
Amazon Novasonic: Speech-to-speech model for voice interface
Text-based agent: Built with trans agents, handles core business logic
Amazon SNS: Messaging for manager notifications
HR and rostering system APIs: Integration with existing enterprise systems
Knowledge bases: HR policy documents accessible to agents
S3: Logging and observability for all reasoning steps
Lambda functions: Custom guardrails and authentication
ECS: Audio streaming management for speech interface

The architecture emphasizes observability, with all reasoning steps logged to S3, enabling analysis of agent behavior, debugging, and continuous improvement.

Results and Business Impact

While the presentation acknowledged this is an AWS-sponsored case study (presented at re:Invent), specific quantifiable results were provided:

99% consistency in absence reporting: Standardized processes across three airports that previously had variable workflows, unlocking potential for downstream pattern analysis
90% reduction in recording time: Dramatic efficiency improvement in the time from absence report to full system updates
Cost reductions: Reduced overtime payments (from faster replacement scheduling), elimination of third-party helpline costs, and increased passenger spending (from faster security processing)

The presenters framed these as “first use case” results, emphasizing that the real value comes from the extensible foundation for the broader digital colleague workplace vision.

Critical Assessment and Tradeoffs

From an LLMOps perspective, several aspects of this case study merit balanced assessment:

Strengths of the approach:

The incremental “think big, start small, scale fast” philosophy is sound, avoiding the trap of trying to build the ultimate solution immediately
The architectural decision to use MCP and separate concerns between agents and tools provides genuine extensibility and maintainability
The agentic hierarchy (speech agent calling text agent calling tools) is an elegant solution to the multi-modal challenge
The focus on authentication, guardrails, and security is appropriate for critical infrastructure
The emphasis on UI and user experience addresses a common failure point in agentic systems

Potential concerns and considerations:

The 99% consistency metric is impressive but raises questions about the 1% edge cases—what happens when the system fails, and what human fallback processes exist?
Latency remains a challenge even with UI improvements; the presentation doesn’t provide specific numbers on end-to-end processing time
The cost-benefit analysis focuses on easily quantifiable metrics (overtime costs, third-party fees) but doesn’t address the development and operational costs of the agentic infrastructure itself
The case study is presented at an AWS conference by AWS employees and an AWS customer, so some healthy skepticism about claimed results is warranted—independent verification would strengthen confidence
The production status is somewhat ambiguous; while the solution has been tested and demoed, the presentation doesn’t make entirely clear whether it’s fully deployed in production handling all absence reporting or still in limited rollout
Error handling and graceful degradation strategies aren’t detailed—what happens when the HR API is unavailable or when the LLM hallucinates policy information?
The model selection rationale isn’t discussed; why specific Bedrock models were chosen and whether alternatives were evaluated isn’t covered

LLMOps maturity indicators:

Evaluation: The presentation mentions testing and security review but doesn’t detail systematic evaluation approaches, test coverage, or ongoing monitoring metrics beyond system performance
Observability: Logging reasoning steps to S3 provides a foundation, but the presentation doesn’t cover how this data is analyzed, what dashboards exist, or how issues are detected and resolved
Continuous improvement: No discussion of feedback loops, model fine-tuning, prompt iteration based on production usage, or A/B testing approaches
Deployment: Agent Core Runtime provides the deployment mechanism, but details on CI/CD pipelines, versioning, rollback strategies, or blue-green deployments aren’t covered
Monitoring and alerting: Beyond latency and user engagement, what metrics are tracked in production, and what thresholds trigger human intervention?

Future Direction and Scalability

MAG’s vision extends well beyond absence reporting to a comprehensive digital colleague workplace encompassing:

Automated rostering optimization based on predicted demand
Fault reporting and asset management
Terminal resource allocation based on flight schedules and stand usage
Cross-functional optimization (e.g., coordinating staffing when equipment failures affect operations)

The presentation positions this as a journey toward “the world’s most intelligent airport group,” with the current implementation representing foundational infrastructure that will support “a great number more tools” and “more agents with the right level of humans in the loop.”

The scalability of the architectural approach—particularly the MCP-based tool microservices and Agent Core Runtime—appears well-suited to this vision, though the complexity of coordinating multiple specialized agents for cross-functional optimization will likely surface new challenges around agent communication protocols, goal alignment, and system-level guardrails.

Key LLMOps Lessons

The presentation articulated four primary lessons learned, which represent valuable LLMOps insights:

1. Adapt tools for agents: Don’t assume that existing software tools will work optimally with agents. Verbose error messages, human-readable outputs, and combining sequential tools specifically for agent consumption significantly improve performance and reliability.

2. Write custom guardrails for multi-turn conversations: Standard guardrails that only examine the last message are insufficient for agentic systems. Context-aware guardrails that analyze conversation patterns across multiple turns are necessary to prevent sophisticated manipulation or social engineering.

3. Read speech-to-speech model prompts out loud: Speech models have fundamentally different characteristics than text models and require prompts written in natural spoken language patterns.

4. UI is king: No matter how sophisticated the backend, user experience determines success. Managing expectations during inevitable agentic latency through progress indicators is essential.

To these, we might add implicit lessons evident in the architecture:

Separation of concerns: Decoupling agents from tools via standardized protocols (MCP) provides extensibility and maintainability at scale
Agentic hierarchy: Multi-modal systems may benefit from agents calling other agents rather than duplicating logic across modalities
Security-first design: For critical infrastructure, authentication, authorization, and guardrails must be built in from the start, not added later

Conclusion

Manchester Airports Group’s implementation represents a substantial real-world deployment of agentic AI in a complex, high-stakes operational environment. The technical architecture demonstrates thoughtful design decisions around scalability, security, and user experience. The incremental approach, starting with a focused use case while building extensible infrastructure, offers a practical model for other organizations considering agentic AI for complex operational processes.

However, as with any vendor-presented case study, some healthy skepticism about claimed results is warranted, and several important LLMOps maturity questions remain unanswered around evaluation, monitoring, and continuous improvement processes. The true test will be whether the foundation they’ve built successfully scales to the ambitious multi-agent digital colleague workplace vision, and whether the 99% consistency and 90% time reduction metrics hold up as the system handles increasing complexity and volume in production.

Agentic AI for Automated Absence Reporting and Shift Management at Airport Operations

Industry

Technologies