The Model Context Protocol (MCP) is an open standard, originally developed by Anthropic, that defines how AI agents discover and invoke external tools. It provides a universal interface -- comparable to USB-C for hardware -- that lets any AI model connect to any capability through a consistent JSON-RPC transport layer, eliminating the need for framework-specific tool integrations.
MCP uses a client-server architecture. An MCP server advertises its capabilities by responding to a tools/list request with JSON schemas describing each tool's name, description, and typed input parameters. The MCP client (the AI agent or host application) discovers available servers, fetches their tool lists, presents the schemas to the language model, and routes tool invocations to the correct server. Communication happens over stdio for local tools or HTTP with Server-Sent Events (SSE) for remote services.
MCP defines three transport mechanisms: stdio (lowest latency, for local subprocesses), HTTP+SSE (for remote services and team-shared servers), and Streamable HTTP (for high-throughput production workloads with connection pooling). The protocol uses JSON-RPC 2.0 for message formatting. Servers can expose tools (callable functions), resources (readable data), and prompts (reusable templates). As of early 2026, the ecosystem includes over 10,000 public servers and the SDK sees over 90 million monthly downloads.
Most MCP servers today expose text-based tools: file reads, database queries, web searches. But 80-90% of enterprise data is unstructured -- video, images, audio, documents. Multimodal MCP servers bridge this gap by exposing tools like semantic video search, image classification, audio transcription, and brand detection. These tools give AI agents perception over the physical world, enabling use cases like media asset management, brand safety monitoring, compliance review, and visual product search.