ZenML

Automated Carrier Claims Management Using AI Agents

FIEGE 2025
View original source

FIEGE, a major German logistics provider, implemented an AI agent system to handle carrier claims processing end-to-end, launched in September 2024. The system automatically processes claims from initial email receipt through resolution, handling multiple languages and document types. By implementing a controlled approach with sandboxed generative AI and templated responses, the system successfully processes 70-90% of claims automatically, resulting in eight-digit cost savings while maintaining high accuracy and reliability.

Industry

Tech

Technologies

Overview

This case study, presented by Big Picture (an IT development company with 15+ years of experience and 5 years focused on AI solutions), details a production AI agent system built for FIEGE, one of Germany’s main logistics solution providers. FIEGE operates with approximately 22,000 employees and around €2 billion in annual turnover. The AI agent was deployed in September 2024 to handle carrier claims management—the process of finding lost parcels and demanding compensation from carriers.

The presentation was given as part of an “Agent Hour” session focused on understanding what constitutes a real AI agent versus simpler implementations like custom GPTs. The presenter explicitly challenged the narrative that “AI agents are coming in 2025,” arguing that production-ready agents were already live in 2024, with the main current challenge being enterprise adoption.

Problem Statement

The logistics industry faces significant operational costs in managing carrier claims, which traditionally requires human workers to read customer emails, communicate with carriers, verify contractual obligations, and process reimbursements. This multi-step process involves analyzing various document types (text, PDFs, images), communicating with multiple stakeholders in various European languages, and making decisions based on complex contractual rules.

Technical Architecture and LLMOps Approach

Sandboxed Generative AI Design

The core architectural principle behind this implementation is what the presenters call “sandboxing” the generative AI. Rather than allowing the LLM to freely generate responses end-to-end, the system constrains generative AI to specific tasks where it excels—classification, extraction, and assessment—while embedding these capabilities within deterministic workflows.

The presenter uses the metaphor of AI being “like a teenager”—it has great potential and can be very smart, but organizations still shouldn’t let it run their business unsupervised. This philosophy directly shaped the technical architecture.

Microsoft Azure Infrastructure

The solution is built on Microsoft Azure, utilizing:

This infrastructure enables the deterministic process flows that contain and direct the LLM’s capabilities, while maintaining the flexibility to swap components as needed.

Template-Based Response Generation

A critical design decision was avoiding direct LLM-generated responses. Instead of letting the model “improvise” answers, the system uses templates—exactly as a human team would operate with policy-approved response templates. The LLM’s role is to analyze incoming communications, extract relevant information, and select appropriate templates based on the analysis, but not to compose novel responses.

This approach directly addresses hallucination concerns. As the presenter noted, for “a funny chitchat on ChatGPT” improvisation might be acceptable, but for processing claims and reimbursements, it is “absolutely not” acceptable.

Reasoning and Chain of Thought

The system employs reasoning capabilities (described as “thinking before answering”) to improve results and make answers comprehensible for human users. Chain of Thought prompting is used both for accuracy and explainability—the latter being noted as important for AI Act compliance in Europe.

Multilingual Processing

The model can understand and process emails in any European language, enabling FIEGE to handle claims from carriers across their European operations without language barriers.

Confidence Thresholds and Human Handoff

A sophisticated confidence threshold mechanism determines when the AI can proceed autonomously versus when human intervention is required. The presenter described this as teaching the model to recognize “I can’t really resolve the case right now, I need to ask my boss.”

The implementation uses examples and counter-examples in prompting to train the model on self-assessment. Over time, through iteration and operational experience, these thresholds can be adjusted to optimize the balance between automation and human oversight. The presenter explicitly stated that the model is not expected to process 100% of cases alone correctly, and this acknowledgment is built into the system design.

When the AI cannot proceed, it fails gracefully and creates a ticket for human workers in their existing ticketing system (Jira was mentioned as an example), adding to their normal queue rather than requiring a separate workflow.

The Three A’s Framework

The presenter introduced a framework called “The Three A’s” for understanding AI agent adoption challenges:

Accuracy

Concerns about reliability and hallucination are addressed through:

Autonomy

The ability to accomplish tasks end-to-end without constant human checking is achieved through:

Acceptance

Both management and team acceptance is facilitated through:

Integration Philosophy

A key design principle emphasized is avoiding lock-in to any single ecosystem. The presenter noted that language models change “from months to months” rather than years, so the architecture maintains flexibility to swap models without requiring customers to change their processes.

Similarly, the system integrates with existing enterprise systems rather than requiring organizations to abandon tools like SAP. The AI agent uses the same systems the human team uses, making adoption significantly easier.

PII and Data Security

During the Q&A, data privacy was addressed. The system is designed to prevent PII leakage through:

Results and Business Impact

The solution reportedly:

Critical Assessment

While the results claimed are impressive, it’s worth noting this presentation was given by the vendor (Big Picture) who built the solution, so the claims should be viewed with appropriate context. The eight-digit revenue improvement claim is notable but lacks specific quantification or independent verification.

That said, the technical approach described—sandboxing LLMs within deterministic workflows, using templates rather than free generation, implementing confidence thresholds for human handoff—represents genuinely sound LLMOps practices that address real concerns about reliability in production systems.

The system’s ability to handle 70-90% of cases autonomously while gracefully escalating the remainder suggests a mature understanding of where current LLM capabilities excel and where human judgment remains necessary.

Broader Applicability

The presenter identified several industries where similar approaches would be applicable:

The core pattern of constraining LLM capabilities within structured workflows while leveraging their strengths in document understanding, classification, and multi-lingual processing appears broadly transferable to other domains requiring similar document-heavy, multi-party communication processes.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

AI-Powered Security Operations Center with Agentic AI for Threat Detection and Response

Trellix 2025

Trellix, in partnership with AWS, developed an AI-powered Security Operations Center (SOC) using agentic AI to address the challenge of overwhelming security alerts that human analysts cannot effectively process. The solution leverages AWS Bedrock with multiple models (Amazon Nova for classification, Claude Sonnet for analysis) to automatically investigate security alerts, correlate data across multiple sources, and provide detailed threat assessments. The system uses a multi-agent architecture where AI agents autonomously select tools, gather context from various security platforms, and generate comprehensive incident reports, significantly reducing the burden on human analysts while improving threat detection accuracy.

fraud_detection customer_support classification +31

Building Production AI Agents with API Platform and Multi-Modal Capabilities

Manus AI 2025

Manus AI demonstrates their production-ready AI agent platform through a technical workshop showcasing their API and application framework. The session covers building complex AI applications including a Slack bot, web applications, browser automation, and invoice processing systems. The platform addresses key production challenges such as infrastructure scaling, sandboxed execution environments, file handling, webhook management, and multi-turn conversations. Through live demonstrations and code walkthroughs, the workshop illustrates how their platform enables developers to build and deploy AI agents that handle millions of daily conversations while providing consistent pricing and functionality across web, mobile, Slack, and API interfaces.

chatbot customer_support document_processing +38