Letta: Building Stateful AI Agents with In-Context Learning and Memory Management

Overview

Letta is a platform designed specifically for building and deploying stateful AI agents that can learn and improve over time in production environments. The company works with customers including Built Rewards, a rent payment credit card company building recommendation agents, and 11X, which builds deep research agents. The platform’s core thesis is that current LLM-based agents suffer from a critical limitation that prevents them from becoming true AI co-workers: they lack memory and the ability to learn, making them effective only for workflow automation rather than tasks requiring accumulated knowledge and continuous improvement.

The presenter argues that while modern LLMs are highly intelligent and can solve complex problems, they operate as stateless functions that constantly forget, similar to the characters in movies like “50 First Dates” or “Memento.” This creates a severe practical limitation: agents not only fail to learn from experience but actually degrade over time as their context windows fill with potentially erroneous information. Chroma’s research on context rot and Letta’s own RecoveryBench benchmark demonstrate this degradation empirically, showing that Claude Sonnet’s performance drops significantly when operating from a polluted state versus a fresh state. Interestingly, GPT-4o shows better recovery capabilities than Anthropic models when dealing with corrupted historical context.

Technical Architecture and Approach

Rather than pursuing traditional learning through parameter updates via fine-tuning or reinforcement learning, which presents significant production challenges, Letta takes a fundamentally different approach: learning into the context window itself. This paradigm, referred to variously as system prompt learning, memory, agentic context management, or context engineering, treats the context window as a mutable knowledge store that the agent actively manages and updates.

This approach offers several critical advantages for production deployments. First, it provides extreme interpretability since learned information is stored as human-readable text that can be directly inspected, evaluated using LLM-as-judge techniques, or manually reviewed. Second, it is model-agnostic, meaning the same learned context can be passed to OpenAI, Anthropic, Google, or open-source models without modification. Third, it enables online learning without the complex evaluation pipelines required to verify parameter updates. Fourth, it works with frontier models accessed via API where parameter-level fine-tuning is limited or unavailable.

The platform provides a comprehensive agent development environment accessible via both UI and API. The system is built around several key abstractions. Memory blocks form the core structure, representing different segments of in-context memory with configurable descriptions, labels, and character limits. For example, a memory block might allocate 20,000 characters to storing information about users, with separate blocks for persona information about the agent itself. These blocks compile into specific sections of the context window that the agent actively manages.

Memory Management and Tool Calling

Letta agents learn through tool calling mechanisms that allow them to rewrite their own memory. The platform provides memory-related tools such as memory_replace that agents invoke to update their context. When an agent learns new information, it uses reasoning to determine what should be stored, then calls the appropriate tool with parameters specifying which memory block to update and what content to write. This creates an explicit, trackable learning process where every memory update is logged and auditable.

The system includes sophisticated context window management capabilities. A context window viewer shows developers exactly what input is being sent to the model, organized into sections including base system instructions explaining Letta’s abstractions, tool schemas showing the raw function definitions available to the agent, external metadata summaries, core memory blocks with learned information, summarization of compacted older context, and the raw message buffer with recent conversation history. This transparency is positioned as essential for debugging, with the claim that 90 percent of agent failures can be traced to context window issues.

Context budget management allows developers to set maximum token limits for the context window. When this budget is exceeded, Letta’s summarization system automatically compacts older information to stay within limits while preserving essential learned knowledge. This differs from naive truncation approaches that simply drop old messages and can destroy agent state, as reportedly happens with some coding assistants when they compact.

Production Deployment and Real-World Examples

The platform supports production deployment through multiple interfaces. The hosted service at app.letta.com provides the UI-based agent development environment with account management and API key generation. The SDK and API allow programmatic agent creation, configuration, and messaging for integration into larger systems. A CLI tool enables local connection to cloud-hosted agents, supporting hybrid architectures where agents run in the cloud but access local resources.

The most compelling production example presented is Ezra, a support agent that has been running for approximately one month. Ezra monitors all Discord activity to learn about Letta’s features, API patterns, common user issues, documentation gaps, and even individual user profiles and their projects. The agent’s memory blocks have evolved to include detailed knowledge about API integration patterns, communication guidelines prioritizing feedback from team members, extensive notes about Letta features and patterns, and identified gaps in official documentation. The presenter positions Ezra as feeling like a fine-tuned model despite running entirely on in-context learning, suggesting the approach can achieve similar results to parameter-level training.

Built Rewards uses Letta to build recommendation agents that learn from user behavior over time, though specific implementation details were not provided. The use case suggests these agents accumulate knowledge about individual user preferences and behaviors to improve recommendations, which would be difficult to achieve with stateless agents.

The 11X deployment focuses on deep research agents, again without detailed specifics, but implying agents that build up knowledge bases over extended research sessions rather than starting fresh each time.

Development Workflow and Tooling

Letta provides a comprehensive development environment designed to make building stateful agents accessible. The UI allows configuration of agent name, model selection with support for multiple providers including Claude and GPT families, system instructions, parallel tool calling settings, temperature controls, and other standard parameters. Custom tools can be added, and the platform supports Model Context Protocol integration for standardized tool interfaces.

A notable feature is agent portability through downloadable agent files. Developers can export an agent’s complete state including learned memories and configuration, then import it elsewhere. This supports use cases like moving agents between development and production servers, sharing agent states for debugging, or migrating between self-hosted and cloud deployments.

The memory block system is fully customizable. Developers can add new blocks, define their purpose through descriptions, allocate character budgets, and structure memory in domain-specific ways. The pre-configured blocks for human information and persona serve as templates but can be extended for specific applications.

Tool Execution Modes and Local Environment Access

An experimental feature called Letta Code demonstrates advanced tool execution patterns. Agents running in Letta’s cloud can execute tools in different modes. Server-side tools like web page fetching or conversation search run entirely in the cloud infrastructure. Human-in-the-loop tools require client-side execution and permission, enabling cloud agents to access local file systems while maintaining security through approval workflows.

The demo showed an agent taking a linear ticket for a feature request and working on it by searching local files with grep, reading code, and proposing edits, all while the agent’s reasoning and state persist in the cloud. This creates an experience positioned as similar to Anthropic’s Claude Code but with statefulness allowing the agent to form persistent memories about the codebase and project over time rather than starting fresh with each session.

The human-in-the-loop approval pattern for local file operations provides a security model where sensitive operations require explicit user permission. The agent requests to execute tools like file edits, the client shows the proposed operation, and the user approves or rejects. This allows powerful capabilities while maintaining control.

Model Agnosticism and Performance Preservation

A significant design principle emphasized throughout is ensuring Letta adds capabilities without degrading base model performance. The concern is that wrapping a highly capable model like Claude Sonnet in additional harness logic could regress performance compared to using the model directly. The platform’s approach to context management, tool schemas, and instruction design aims to only add memory and statefulness on top of existing model capabilities.

The model-agnostic architecture means switching between providers requires only changing a configuration setting. The same memory blocks, tools, and learned information work across OpenAI, Anthropic, Google, and open-source models. This provides flexibility for production deployments to change providers based on cost, capability, or availability without rebuilding agent infrastructure.

Critical Assessment and Production Considerations

While Letta presents a compelling vision of stateful agents, several important considerations emerge for production use. The approach fundamentally depends on the agent’s ability to correctly decide what to learn and how to structure that information in memory. If the agent makes poor decisions about memory updates, this could accumulate incorrect information over time. The interpretability of text-based memory helps with this by making it easy to audit, but scaling to many agents would require robust monitoring.

The context window management and summarization capabilities are critical for long-running agents but introduce complexity around what information gets compressed and potentially lost. The claim that Letta’s summarization is superior to simple compaction is reasonable but would benefit from empirical validation showing agents maintain performance over very long timescales.

The production examples, while suggestive, are limited in detail. Ezra’s month of learning from Discord is impressive but represents a relatively controlled environment with clear information sources. The Built Rewards and 11X deployments lack specifics about scale, performance metrics, or challenges encountered. Production users would benefit from more concrete data about agent reliability, learning accuracy, and edge cases.

The comparison to fine-tuning is interesting but potentially overstated. While in-context learning provides flexibility and interpretability, it consumes context window space that could otherwise be used for other purposes, and the quality of learned information depends entirely on the agent’s reasoning capabilities. Fine-tuning can encode knowledge more efficiently and reliably, though with the significant operational overhead Letta correctly identifies.

The platform’s value proposition centers on making stateful agents accessible and manageable in production. The developer experience with context window visibility, memory block configuration, and debugging tools appears well-designed for this purpose. The API and SDK approach allows integration into larger systems, while the hosted service reduces operational burden. For organizations building agent applications that genuinely benefit from learning over time, Letta offers a structured approach to managing the state and memory challenges that pure stateless agents cannot address.

The experimental Letta Code feature demonstrates ambition to compete with established tools like Claude Code, but being experimental suggests it may not yet be production-ready. The hybrid cloud-local architecture is architecturally interesting for security and performance but adds complexity compared to purely cloud or purely local solutions.

Overall, Letta represents a thoughtful approach to a real problem in production LLM applications: the statelessness of agents limits their utility for tasks requiring accumulated knowledge. The in-context learning paradigm offers practical advantages over parameter-level approaches for many use cases, and the platform provides infrastructure to make this approach accessible. However, the success of this approach in production depends heavily on the base model’s reasoning capabilities, the quality of memory management prompts and tools, and careful monitoring of what agents actually learn over time.

Building Stateful AI Agents with In-Context Learning and Memory Management

Industry

Technologies

Overview

Technical Architecture and Approach

Memory Management and Tool Calling

Production Deployment and Real-World Examples

Development Workflow and Tooling

Tool Execution Modes and Local Environment Access

Model Agnosticism and Performance Preservation

Critical Assessment and Production Considerations

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Building Economic Infrastructure for AI with Foundation Models and Agentic Commerce

Building Production-Grade AI Agents with Guardrails, Context Management, and Security