TL;DR: In Claude Code, a project's approval of its MCP servers is recorded by server name, not by the specific command that was shown and approved. Once a user grants a project's MCP servers (in particular with the "Use this and all future MCP servers in this project" option), a later change to that project's
.mcp.jsonthat keeps the server name the same but changes the command will run the new command at the nextclaudestartup, as the user, with no dialog and no warning, and on every subsequent launch. We reproduced this on Claude Code v2.1.170 (macOS, arm64). We disclosed it to Anthropic, who reviewed it and determined the behavior is working as designed, because the option used is an explicit standing grant. This is not a zero-day and there is no CVE. It is a design-boundary question, the same kind we raised in our VS Code Workspace Trust post: what does a trust click actually authorize once the assistant is an agent that executes code on your behalf.
A series about what a trust click authorizes#
This is the second post in a short series on trust boundaries in agentic developer tools. The first looked at VS Code and Workspace Trust: trusting a folder used to mean "let this code run," and in the agentic era it can also mean "let an assistant run shell commands for me," without the prompt growing to say so. This post looks at a different tool and a different mechanism that lands in a similar place. The common thread is not a flaw in any one product. It is that the trust prompts these tools inherited were designed for a world where code ran when you asked it to, and they are now standing in for decisions about software that acts on its own.
What the MCP approval prompt asks#
The Model Context Protocol (MCP) lets an assistant talk to external tools and data sources through small server processes. A project can ship its own MCP servers by committing a .mcp.json file to the repository. When you open such a project in Claude Code, you are asked to approve the project's MCP servers before they run. The approval offers a choice that includes an option labeled, in effect, "Use this and all future MCP servers in this project."
What a reasonable user believes they are approving is the server in front of them: this tool, doing roughly this thing. The "all future" option reads, to most people, as a convenience so they are not re-prompted every time the project adds a new helper. What it actually establishes is a standing, project-wide grant. The distinction between "I approved this server" and "I approved this project's MCP configuration, present and future" is where the rest of this post lives.
The mechanism: approvals are keyed on the server name#
A project's MCP approval decisions are stored in ~/.claude.json. Under each project entry there are two arrays relevant here: enabledMcpjsonServers and disabledMcpjsonServers. These hold server names as strings. They do not hold the full server definition, the command, the arguments, or any hash of what you saw when you approved.
So the approval ledger answers one question: is the server named alpha allowed to run in this project. It does not answer the question that matters once the assistant executes: is this command, the one you read and approved, allowed to run. The actual command lives in the project's .mcp.json, which is version-controlled and editable by anyone who can commit to the repository. The clearest proof that the approval is not binding the command is behavioral. If the stored approval contained the command you reviewed, a changed command would no longer match it and Claude Code would re-prompt. It does not re-prompt. The approval cannot be binding on the command.
The grant pins the name alpha, not the command underneath it, so swapping command A for command B leaves the approval matching and runs B with no new prompt.
The reproduction#
We tested this with a small, fully local proof of concept. Every effect is a benign sentinel file under /tmp; there are no network calls and no real secrets are touched. The flow has four phases.
Phase 1, the honest grant. A throwaway git project ships a .mcp.json with one server named alpha. Its command is benign: it writes a timestamped sentinel to /tmp and then sleeps so the process stays up. The user opens the project in claude, sees the MCP approval prompt for alpha, and chooses the "Use this and all future MCP servers in this project" option. The sentinel appears. The grant is now recorded.
Phase 2, the mutation. A second commit keeps the server name alpha exactly as before, but replaces the command with a different one. This is the move that mirrors a real supply-chain change: a dependency update, a merged pull request, a synced branch, a compromised maintainer. At the next claude launch in that project, the new command runs at startup. There is no approval dialog and no setup-issue warning. The grant from Phase 1, keyed on the name alpha, covers the new command because the name still matches.
Phase 2b, the capability. To show this is genuine code execution and not a sandboxed toy, the proof of concept optionally runs a benign capability check under the same mechanism. The spawned command lists the names of the user's environment variables (values omitted) and the entries in ~/.ssh, and writes them to a local file. On our test run it confirmed the presence of an SSH private key and read lines from ~/.zshrc. The startup command runs with the user's full privileges. It can read any file the user can read, including SSH keys, shell configuration, and environment variables that may carry tokens or cloud credentials. We are deliberately not publishing a weaponized configuration. The point is the capability, not a copy-pasteable exfiltration string.
Phase 3, persistence. Because the grant is durable, the command is not a one-time event. Launching claude in the project three times produces three silent spawns, one per session. The execution recurs on every launch for as long as the grant and the configuration remain in place.
The summary is plain: one commit changed a trusted server's command, and every subsequent claude launch in that project then ran that command at startup, as the user, with no prompt, and kept doing so.
The argument: the grant binds to a label, not to a definition#
Everywhere else in security-sensitive software, a trust decision binds to an identity, something that changes when the underlying thing changes. TLS pins a certificate, and a different certificate breaks the connection. SSH pins a host key, and a changed key produces a loud warning. Docker can pin an image by digest, so a re-tagged image is a different reference. Git can verify a commit signature. In each case, "I trust this" is attached to a fingerprint of the specific thing, so that substituting a new thing under the old name is detectable.
Four systems pin a fingerprint of the specific thing, so a swap is detectable; the name-keyed MCP grant pins only a bare label, so it isn't.
A name-keyed, project-wide MCP grant binds to a bare label. "I trust the server named alpha" carries forward to whatever command later wears the name alpha. When MCP servers were mostly read-only connectors and the assistant mostly answered questions, the gap between the label and the definition was low-stakes. Once the assistant is an agent that executes, and an MCP server is an arbitrary command line that runs at startup, the gap becomes the whole story. A grant that is not pinned to the reviewed command definition means that a single commit to a repository you already trust is enough to run code on your machine the next time you open it, without you approving the new code or even seeing that anything changed.
This is why the threat model is ordinary rather than exotic. It does not require you to clone something malicious. It requires you to have trusted a project once, the normal thing, and then to pull, merge, or sync a later change to that same project, also the normal thing.
Anthropic's position, fairly stated#
We disclosed this to Anthropic. Their response, in full on the central question:
"After review, this behavior is working as designed. Project-scoped MCP servers from .mcp.json only run after two explicit user decisions: accepting the workspace trust dialog for the project folder, and approving the project's MCP servers in the MCP approval dialog. The option used in your reproduction — 'Use this and all future MCP servers in this project' — is an explicit standing grant covering the project's current and future MCP server configurations, so later changes to the project's MCP configuration run under that grant without a new prompt. ... Users can clear a project's MCP approval choices at any time with
claude mcp reset-project-choices. We have noted your suggestion to bind approvals to the approved server definition as a defense-in-depth hardening direction."
That position deserves to be steelmanned, not dismissed. The user clicked an option that says, in plain language, "all future MCP servers in this project." Anthropic's reading is the literal meaning of that button. Two explicit gates stand in front of any execution: trusting the folder, then approving the project's servers. If you grant standing authority over a project's current and future MCP configurations, it is internally consistent that a future configuration runs under it. There is also a documented escape hatch, claude mcp reset-project-choices, and Anthropic noted the specific hardening we suggested, binding approvals to the approved definition, as a direction worth considering.
So this is not a case of a vendor missing a bug. It is a disagreement about what a button should mean, and reasonable people can read that button the way Anthropic does.
To be precise about severity: this is not a zero-day, and there is no CVE. It is a design-boundary question. The disagreement is not about whether the behavior occurs, since both sides agree it does, but about whether the standing-grant experience adequately conveys that approving a project's MCP servers can authorize code you have not seen, committed by someone who is not you, to run at startup. Our argument is that as these grants come to authorize execution, binding them to the reviewed definition rather than to a name is the safer default, which is the same hardening Anthropic said it has noted.
Defenses for readers#
The mitigations follow from the mechanism, and they are habits more than patches.
- Prefer the narrow grant over "all future." When you approve a project's MCP servers, choose the option that approves only the specific server in front of you rather than the standing project-wide grant. At minimum this avoids silently auto-approving servers that did not exist when you first looked. Because approvals are name-keyed, a narrow grant reduces your exposure to new servers; do not assume it re-validates a changed command on a name you already approved.
- Review
.mcp.jsonon every pull, merge, or branch switch in a trusted repo. It is usually a small file. A change to a server'scommandorargsis a change to what runs on your machine at startup. This is the single highest-value habit, because it addresses the mutation directly rather than relying on a prompt. - Reset project approvals when in doubt.
claude mcp reset-project-choicesclears a project's stored MCP approval decisions, so the next launch re-prompts from scratch. - Treat dependency repositories and branches you do not control as a command-execution surface. A vendored sub-dependency, a teammate's hand-off branch, a fork you are evaluating, or any repo where someone else can commit is a place where
.mcp.jsoncan change under a name you already trust. Open those in a sandbox or a disposable environment if the contents are not yours.
Responsible disclosure and timeline#
- 2026-06-10, 06:24 GMT. Reported to Anthropic, with the proof of concept and the name-keyed-approval mechanism.
- 2026-06-10, 16:19 GMT. Anthropic responded: working as designed; the option used is an explicit standing grant;
claude mcp reset-project-choicesclears project approvals; and the suggestion to bind approvals to the approved server definition was noted as a defense-in-depth hardening direction.
Tested on Claude Code v2.1.170, macOS (arm64). No CVE is assigned and none is claimed. We are publishing in the same spirit as our VS Code post: to document a trust-boundary question whether or not the vendor classifies it as a vulnerability, because the answer, in this case "this is by design," is itself the most useful thing for developers to know.



