A documented attack on AI agents requires no credential theft, exploits no zero-day, and triggers no anomaly alerts. Every request carries legitimate credentials, and it lives inside the MCP specification itself.
This is the core challenge with Model Context Protocol (MCP) driven exploits: the attack happens inside the boundaries of normal operations. Understanding how requires tracing three distinct attack families, each exploiting a different layer of how agents interact with the world on a user's behalf.
The Confused Deputy: How the Spec Documents Its Own Vulnerability
Anthropic introduced MCP in November 2024 as an open-standard, open-source framework to standardize how large language models integrate with external tools, data sources, and systems. The specification itself now includes a security best practices section that reads like a controlled demolition manual.
The confused deputy attack becomes possible, the spec notes, when four conditions are present simultaneously: the proxy server uses a static client ID, allows clients to dynamically register, the third-party authorization server sets a consent cookie, and the proxy server does not implement proper per-client consent. Under those conditions, an attacker can steal authorization tokens without any user approval because the system cannot distinguish between the legitimate client and the attacker who has hijacked the consent cookie.
The fix is documented in the same section. Proxy servers must maintain a registry of approved client_id values per user, check this registry before initiating the third-party authorization flow, and store consent decisions securely. MCP proxy servers, the spec states, must implement per-client consent and proper security controls. This is not guidance. It is a MUST in specification language.
The Sampling Attack: When the Server Talks Back
The second attack family targets MCP's sampling capability, which reverses the typical client-driven pattern. With sampling, MCP servers can proactively request LLM completions from the client by sending sampling requests back. Unit42 researchers identified three distinct attack vectors exploiting this capability.
Resource theft allows attackers to abuse sampling to drain AI compute quotas, with the consumption happening invisibly on the client's tab. Conversation hijacking occurs when a compromised or malicious MCP server injects persistent instructions that survive the current session. Covert tool invocation enables hidden file system operations that the user never consented to, executing silently in the background of what appears to be a legitimate interaction.
The common thread: every request uses legitimate credentials. Every system interaction is technically authorized. RTInsights noted the structural challenge this creates for security teams. "This is the core challenge with MCP-driven exploits: the attack happens inside the boundaries of normal operations."
Cross-Agent Escalation: When Your Copilot Rewrites Your Claude Config
The third and most immediately dangerous attack family requires no undocumented vulnerability. It exploits the fact that agents can write to other agents' configuration files.
As Emanuel Yaconi and Omer Minster documented at Embrace The Red, an indirect prompt injection can hijack GitHub Copilot and make it silently write to the Claude Code MCP configuration to add a malicious server. The attack is reproducible today with shipping products. It requires no zero-day. It requires no credential theft. It requires a user who has configured both Copilot and Claude Code on the same machine and an MCP server that receives instructions from a context an agent trusts.
The Authzed timeline of MCP breaches documents a related variant: malicious MCP servers could send a booby-trapped authorization_endpoint that mcp-remote passed straight to the system shell, achieving remote code execution on the client machine. CVE-2025-49596, as SentinelOne noted, represents the same pattern at scale: arbitrary command execution via unauthenticated MCP Inspector instances, carrying a CVSS score of 9.4.
The Structural Problem
The deeper issue is architectural. Simon Willison observed the specific danger: any time you mix tools that can perform actions on a user's behalf with exposure to potentially untrusted input, you allow attackers to make those tools do whatever they want. "Mixing together private data, untrusted instructions and exfiltration vectors is the other toxic combination," he wrote.
The MCP spec itself makes this worse in a specific way. As the Embrace The Red researchers noted, the title, description, and parameter names of a tool are automatically inserted into the system prompt. This means that enabling a tool already hands control of the LLM inference over to that specific MCP server. The server decides what the model thinks it can do. That is not a misconfiguration. That is how the protocol is designed to work.
What Defenders Can Actually Do
The MCP spec's security best practices section provides the starting point. Per-client consent storage, proper consent UI, consent cookie security, redirect URI validation, and OAuth state parameter validation are all documented as requirements for proxy servers. In practice, most existing deployments do not implement all of these.
The more fundamental shift is treating MCP servers as untrusted by default rather than assuming they operate in the user's interest. Tool metadata insertion into the system prompt is not a bug to be patched; it is a feature that makes the trust assumption explicit. The question for organizations deploying MCP is not whether an agent will call a tool, but whether the server providing that tool has been designed and secured with the same rigor as the agent that consumes it.
For teams evaluating MCP servers and agent frameworks today, the practical checklist: verify per-client consent implementation, confirm no static client IDs in multi-tenant environments, audit dynamic client registration flows, and treat any MCP server that handles sensitive context as a potential attack surface, not just an integration point.