ZenML

Model Context Protocol (MCP): A Universal Standard for AI Application Extensions

Anthropic 2024
View original source

Anthropic developed the Model Context Protocol (MCP) to solve the challenge of extending AI applications with plugins and external functionality in a standardized way. Inspired by the Language Server Protocol (LSP), MCP provides a universal connector that enables AI applications to interact with various tools, resources, and prompts through a client-server architecture. The protocol has gained significant community adoption and contributions from companies like Shopify, Microsoft, and JetBrains, demonstrating its potential as an open standard for AI application integration.

Industry

Tech

Technologies

Overview

The Model Context Protocol (MCP) represents Anthropic’s approach to solving a fundamental challenge in production LLM systems: how to enable AI applications to extend themselves and integrate with external tools, data sources, and capabilities in a standardized way. This podcast discussion features David and Justin from Anthropic, the creators of MCP, explaining the protocol’s origins, design decisions, and implications for LLMOps.

MCP is described as being analogous to a “USB-C port for AI applications”—a universal connector that allows any AI application to interface with any compatible server providing tools, data, or prompts. The protocol addresses what the creators call the “M×N problem,” where M different AI applications would otherwise need custom integrations with N different external services, resulting in an explosion of one-off implementations.

Origin Story and Development Timeline

The development of MCP began around July 2024 when David, who had recently joined Anthropic to work on internal developer tooling, became frustrated with the fragmented experience of using AI tools. He describes constantly copying and pasting between Claude Desktop (which had features like Artifacts but limited extensibility) and IDE environments (which could interact with the file system but lacked other Claude Desktop features). This friction led to the question: what would it take for applications to allow users to build their own integrations?

David’s background in developer tooling, combined with concurrent work on LSP-related projects, crystallized into the idea of building a protocol. He pitched the concept to Justin, and the two spent approximately a month and a half building the initial protocol, first integrations, and proof-of-concept implementations. The first public MCP implementation appeared in the Zed editor (an open-source IDE) about a month and a half before the official November 2024 announcement, due to the need to work in the open on that open-source project.

An internal Anthropic hackathon approximately one month before release generated significant excitement, with employees building creative applications including an MCP server controlling a 3D printer, demonstrating the protocol’s potential for connecting Claude to the physical world.

Protocol Design Philosophy

Inspiration from LSP

MCP draws heavy inspiration from the Language Server Protocol (LSP), which solved a similar M×N problem in the programming tools ecosystem. Before LSP, every IDE needed custom implementations for every programming language’s tooling features. LSP created a common language allowing language server developers and IDE developers to focus on their respective domains while benefiting from each other’s work.

The creators explicitly studied criticisms of LSP to avoid repeating mistakes. While they adopted JSON-RPC as the communication mechanism (viewing this as a “boring” choice that allowed them to focus innovation elsewhere), they deliberately deviated from LSP’s specific JSON-RPC implementation where they felt improvements were warranted.

Being “Presentation Focused”

A key design principle borrowed from LSP is being “presentation focused”—thinking about how features manifest in applications rather than just their underlying semantics. This led to distinct primitives that might seem similar at an implementation level but are intended to surface differently in user interfaces.

Conservative Protocol Design

The creators emphasize extreme conservatism in protocol design, noting that adding features is easy but removing them breaks backwards compatibility. They prefer to accept only things they are extremely certain about, allowing the community to develop ad-hoc extensions until consensus emerges on what should be added to the core specification.

Core Primitives

MCP defines three primary primitives, each with distinct intended behaviors:

Tools

Tools are designed to be model-controlled—the LLM decides when to invoke them based on the conversation context. This maps to what many developers know as “function calling.” Examples include executing SQL queries, interacting with APIs, or performing file system operations.

Resources

Resources represent data or context that might be added to the model’s context window. Critically, resources are more flexible than tools and can be either model-controlled (where the LLM automatically searches and retrieves relevant resources) or application/user-controlled (where users explicitly select resources through UI affordances like dropdown menus or attachment interfaces).

Resources are uniquely identified by URIs, enabling use cases like building RAG indices over exposed documents, automatic URI interpretation, and user-driven context enrichment. The creators express disappointment that current ecosystem adoption focuses heavily on tools while underutilizing resources.

Prompts

Prompts are user-initiated text or message templates, analogous to slash commands or autocomplete macros in editors. The first MCP implementation in Zed was actually a prompts implementation, allowing users to pull in context like Sentry backtraces on demand rather than waiting for model-initiated actions.

The creators envision prompts being used to provide examples of how to use tools effectively, but note that adoption of this primitive has been limited compared to tools.

Production Deployment Considerations

Client-Server Architecture

Unlike some tool-calling implementations where everything happens within a single process, MCP uses a client-server architecture where:

Statefulness vs. Statelessness

A contentious design discussion centered on statefulness. The creators believe AI applications and interactions will become “inherently more stateful” over time, particularly as new modalities (video, audio) emerge. However, the original SSE-based transport required long-lived persistent connections, creating operational complexity for production deployments.

The new “streamable HTTP” transport addresses this by allowing gradual enhancement: servers can be plain HTTP (stateless), add streaming responses, and add server-initiated requests as needed. Session resumption enables disconnecting and reconnecting while maintaining state, allowing horizontal scaling and handling spotty network connections.

Authorization

The current specification includes OAuth 2.1-based authorization for user-to-server authentication. This operates at the transport layer and is particularly important as the ecosystem shifts toward remote servers. For local servers using standard I/O, authorization works differently (potentially opening a browser for OAuth flows).

The creators resisted adding API key handling directly to the protocol, noting that API keys can be handled through OAuth flows (e.g., an authorization page with a text box for API key input), avoiding over-specification.

Tool Confusion and Scalability

When asked about the maximum number of MCP servers or tools that can be enabled simultaneously, the creators acknowledge there’s no single answer. It depends on the model being used, how well tools are named and described, and the degree of overlap between tool descriptions. They cite Claude’s ability to handle hundreds of tools reliably, but note that confusion increases when multiple servers have overlapping functionality (e.g., both GitHub and GitLab servers enabled simultaneously).

Strategies for managing scale include:

Reference Implementations and Notable Servers

Anthropic maintains reference server implementations demonstrating different use cases:

File System Server

One of the spiritually important servers, directly addressing the original frustration that inspired MCP. It includes an edit_file tool that implements file editing capabilities similar to what some commercial IDEs offer as proprietary features.

Memory Server

A simple (~300 lines of code) implementation enabling cross-conversation memory for LLMs. While production systems may need more robust solutions, it demonstrates how quickly valuable functionality can be prototyped.

Sequential Thinking Server

Provides the model with structured thinking capabilities including branching and extended reasoning space. The creators clarify this is distinct from Anthropic’s separately announced “think tool”—these arose from different teams exploring similar spaces, demonstrating how MCP enables experimentation with different LLM thinking strategies.

Composability and Recursive Patterns

The protocol explicitly enables composability through several mechanisms:

Sampling

Servers can request LLM completions from the client, enabling model-independent agentic behavior. This allows servers to perform summarization or other LLM-dependent operations without requiring API keys or being tied to specific models—the client application controls which model handles the request.

Recursive Server-Client Architecture

An MCP server can simultaneously be an MCP client, consuming other MCP servers to create richer experiences. This enables building directed acyclic graphs (DAGs) of MCP servers that chain together, with potential for auto-selection and auto-installation of dependencies.

The creators note that SDK improvements are needed to make this pattern more accessible, but the protocol architecture already supports it.

Community and Governance

MCP has attracted significant community involvement, with contributions from companies including Shopify, Block, Microsoft (C# SDK), JetBrains (Kotlin SDK), Pydantic (Python SDK improvements), and others. Non-Anthropic contributors have commit and admin access to various repositories.

The creators emphasize a “meritocracy-driven” approach where contributors demonstrating work through practical implementations and SDK contributions are prioritized over those simply expressing opinions in discussions. They acknowledge the challenge of managing the “sheer volume of conversation and notifications” that MCP has generated.

Regarding formal standardization or foundation governance, they express wariness about processes that could slow development in a fast-moving field, preferring to maintain the current collaborative but nimble approach while remaining open to evolution as needed.

MCP vs. OpenAPI

The creators view MCP and OpenAPI specifications as complementary rather than competing:

Best Practices for Server Development

The creators offer guidance for building MCP servers:

Future Directions

The creators highlight several areas for future development:

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Multi-Agent Financial Research and Question Answering System

Yahoo! Finance 2025

Yahoo! Finance built a production-scale financial question answering system using multi-agent architecture to address the information asymmetry between retail and institutional investors. The system leverages Amazon Bedrock Agent Core and employs a supervisor-subagent pattern where specialized agents handle structured data (stock prices, financials), unstructured data (SEC filings, news), and various APIs. The solution processes heterogeneous financial data from multiple sources, handles temporal complexities of fiscal years, and maintains context across sessions. Through a hybrid evaluation approach combining human and AI judges, the system achieves strong accuracy and coverage metrics while processing queries in 5-50 seconds at costs of 2-5 cents per query, demonstrating production viability at scale with support for 100+ concurrent users.

question_answering data_analysis chatbot +49

Scaling AI-Assisted Developer Tools and Agentic Workflows at Scale

Slack 2025

Slack's Developer Experience team embarked on a multi-year journey to integrate generative AI into their internal development workflows, moving from experimental prototypes to production-grade AI assistants and agentic systems. Starting with Amazon SageMaker for initial experimentation, they transitioned to Amazon Bedrock for simplified infrastructure management, achieving a 98% cost reduction. The team rolled out AI coding assistants using Anthropic's Claude Code and Cursor integrated with Bedrock, resulting in 99% developer adoption and a 25% increase in pull request throughput. They then evolved their internal knowledge bot (Buddybot) into a sophisticated multi-agent system handling over 5,000 escalation requests monthly, using AWS Strands as an orchestration framework with Claude Code sub-agents, Temporal for workflow durability, and MCP servers for standardized tool access. The implementation demonstrates a pragmatic approach to LLMOps, prioritizing incremental deployment, security compliance (FedRAMP), observability through OpenTelemetry, and maintaining model agnosticism while scaling to millions of tokens per minute.

code_generation question_answering summarization +46