## Overview
This case study, presented at an AWS conference, describes how Amazon Logistics developed a multi-agent AI system to optimize last-mile delivery planning and help maintain promised delivery dates. The presentation featured speakers from Amazon Logistics product management and AWS Bedrock, providing both the customer use case perspective and the underlying platform capabilities.
Amazon's core value proposition around delivery relies heavily on the "promised delivery date" — the estimated arrival time shown to customers when ordering. Meeting this promise requires massive optimization across multiple dimensions: driver allocation, sorting center staffing, airplane routing between cities, and dynamic adjustments based on seasonal and socioeconomic factors. The company identified that human planners were struggling to manage the complexity, with over 10 million data points across the year requiring analysis. These planners possessed "tribal knowledge" accumulated over years of experience, understanding nuanced circumstances that affect delivery optimization.
## The Problem at Scale
The presentation detailed the sheer scale of the planning challenge using the EU region as an example. Each constraint plan covers 119 days, with each day containing approximately five cycles (next-day, same-day, etc.). Each cycle has over 100 metrics to track. With 249 stations across 10 EU countries, planners face approximately 15 million numbers to analyze and make sense of. This volume makes it virtually impossible for human planners to identify which parameters are causally related versus merely associated, what changes occurred between weekly planning cycles, and why specific bottlenecks exist.
Previously, planners relied on offline Excel-based templates, downloading data and building custom scripts to perform deep dives. This approach was slow, error-prone, and heavily dependent on skills that new planners could only acquire after months of experience. The estimated cost of planning failures, resulting in missed promised delivery dates and customer churn, was calculated at up to $150 million.
## Technical Architecture and Approach
The Amazon Logistics team initially explored various AI approaches before landing on agentic AI. They noted that simple LLM implementations could not solve the problem alone — the complexity required multiple agents coordinating to analyze numbers, optimize paths, and understand the interrelationships between planning parameters.
The core technical innovation involves a graph-based causal inference system. The team implemented the LACQUER (Local Causal Query) algorithm, derived from academic research, to construct graphs representing parameter dependencies. The process works as follows:
First, the system generates heat maps showing probabilistic relationships between all planning parameters. From these heat maps, the algorithm constructs dependency graphs. Then, using formulas encoded from planners' tribal knowledge, the system applies weights to different nodes and edges based on situational context (time of year, socioeconomic conditions, etc.). This weighted graphical model enables the system to distinguish between true causal inference and mere association — a critical distinction for actionable planning decisions.
The architecture employs two primary agents working in collaboration. Agent One creates the weighted graph structure using the LACQUER algorithm and applies tribal knowledge formulas. The agents then coordinate to analyze these graphs and produce optimized plans that account for constraints the initial automated planning system cannot consider.
## Production Implementation
The solution was built using a hybrid approach combining custom agents with managed services. The team uses LangChain for custom AI agents, which provides flexibility for their specific use case while maintaining an open-source foundation. For the underlying LLM, they leverage AWS Bedrock with Claude 3.7, which provides strong reasoning capabilities needed for the complex causal analysis. The architecture is designed for model portability — the presenters explicitly noted they can swap to future models as they become available without architectural changes.
The system integrates directly into existing planning software, providing a conversational interface where planners can interact with the plan evaluation engine. Rather than downloading Excel files and writing scripts, planners can now ask natural language questions like "Why is my plan different?" and receive explanations grounded in the causal graph analysis. The interface also serves as an onboarding tool for new planners, explaining concepts like unconstrained volume planning (UVP) and constrained volume planning (CVP), how they relate to each other, and how the dependency graphs are generated.
## AWS Bedrock Agents Platform
The second portion of the presentation, delivered by an AWS Solutions Architect, covered the broader Bedrock Agents platform that enables deployments like Amazon Logistics'. The platform addresses three key customer concerns: automating complex workflows without managing open-source framework complexity, accelerating development given the rapid pace of model evolution (less than six-month cycles between major model releases), and providing robust, scalable solutions suitable for production.
Amazon Bedrock Agents offers a fully managed experience supporting all available foundation models including reasoning models like Claude 3.7 Sonnet and DeepSeek R1. Recent platform updates extended timeouts to accommodate reasoning models that require longer processing times. Key features include long-term memory, integration with knowledge bases and guardrails, customizable prompts and orchestration strategies (for teams who want alternatives to the default ReAct approach), and comprehensive observability and tracing.
The multi-agent collaboration feature, which entered general availability recently, supports the kind of supervisor-agent architecture demonstrated in the Amazon Logistics case. The presenter showed examples of financial services customers running multi-agent applications in production, validating the maturity of the approach.
## Inline Agents for Dynamic Configuration
A particularly relevant feature for production deployments is "inline agents" — the ability to configure agent capabilities at invocation time rather than pre-creating static agent configurations. This enables dynamic tool exposure based on user context. In the demonstrated HR example, different employees receive access to different knowledge bases and tools based on their role, all through the same API endpoint with different parameters.
The demonstration showed agents coordinating across multiple specialized sub-agents (mortgage assistant, application processing, etc.), with a supervisor agent planning how to decompose complex queries. The code interpreter feature enables agents to execute Python code for mathematical calculations, addressing LLMs' weakness with numerical computation without requiring explicit tool definitions.
## Lessons for LLMOps Practitioners
Several production considerations emerged from this case study. The team emphasized that simply applying LLMs to complex optimization problems doesn't work — they needed the agentic architecture with specialized graph construction and causal inference algorithms. The tribal knowledge encoding is handled through explicit formulas and weighting systems rather than trying to embed it purely in prompts, suggesting a hybrid approach where domain expertise is codified outside the LLM reasoning loop.
The architecture's model-agnostic design, built on LangChain with Bedrock as the inference layer, provides future-proofing against rapid model evolution. The presenters noted that 2025 represents the transition from agent prototypes to production deployments, with financial services customers already running multi-agent systems in production.
The choice between LangChain custom agents versus managed Bedrock agents depends on engineering capability and control requirements. Teams with strong engineering skills can maintain custom implementations for maximum flexibility, while others can leverage managed agents for faster time-to-production with automatic scaling and reduced operational burden.
The case study demonstrates that production LLM deployments at scale often require sophisticated data engineering (the graph construction pipeline), domain expertise encoding (tribal knowledge formulas), and careful architecture design (multi-agent coordination) rather than simple prompt engineering or RAG implementations.