Should model-level instructions be used to restrict MCP tool access?

No. Model-level instructions for tool access control are insufficient as a primary control because they can be overridden through prompt injection. If an attacker injects instructions into the model's context through a tool response or a poisoned retrieval source, those instructions can override the model's policy instructions. Access control must be enforced at the MCP gateway layer, where it operates independently of the model's context window state.

How often should MCP integrations be red teamed?

MCP integrations should be red teamed before initial production deployment and after every update to any connected MCP server, agent framework version, or model version. Each update changes tool definitions, response formats, and the behaviors the agent can invoke. Continuous regression testing is the only approach that keeps pace with the rate of change in MCP-integrated environments, as a new tool definition or model update can open an injection surface that did not exist in the previous version.

The MCP Security Checklist: 12 Controls Before Deploying MCP in Production

TL;DR: The Model Context Protocol gives AI agents structured access to external tools, data sources, and services. That power comes with an attack surface most teams haven't inventoried yet. Indirect prompt injection through tool responses, supply chain risks from third-party MCP servers, and privilege escalation through overpermissioned tool access are the highest-priority risks. These 12 controls cover the security requirements every team should verify before MCP goes into production.

Why MCP changes the security model#

Model Context Protocol (MCP) is Anthropic's open standard for connecting AI models to external tools and data sources via a JSON-RPC 2.0 interface. It is now supported natively by Claude, and third-party MCP servers exist for file systems, databases, APIs, browsers, code execution environments, and dozens of SaaS platforms.

MCP solves a real problem: it gives agents a standardized way to call tools without bespoke integration work for each capability. But it introduces a threat model that most enterprise security teams have not fully assessed. An AI agent using MCP does not simply retrieve data; it executes actions with real-world consequences, including writing files, sending messages, querying databases, and making API calls. The security boundary is not the model; it is every MCP server the model can reach.

Repello's research on MCP tool poisoning to RCE demonstrated a full exploitation chain from a malicious MCP tool definition to remote code execution, establishing that MCP security is not a theoretical concern. Separate Repello research found that a single MCP calendar integration vulnerability exposed 11 distinct AI systems to zero-click exfiltration, illustrating how one misconfigured tool definition propagates risk across every agent that connects to it. The checklist below is organized into three phases: before deployment, at the integration boundary, and in production.

Phase 1: Before deployment#

1. Enumerate every MCP server and tool definition in scope#

You cannot secure what you have not catalogued. Before deployment, produce a complete inventory of every MCP server the agent can reach, every tool definition those servers expose, and every external system each tool can interact with. This includes first-party servers your team built, third-party servers from the MCP marketplace, and any servers introduced through agent frameworks that add MCP connections by default.

The inventory is the foundation for every other control. A tool that is not in the inventory has no access controls, no logging, and no coverage in your red team test plan.

2. Apply least privilege to tool permissions#

Each MCP tool should have only the permissions required for its defined function. A tool that reads from a database should not have write access. A tool that queries a file system path should not have access to paths outside that scope. A tool that calls a read-only API endpoint should not use credentials with write or admin permissions.

Least privilege is the single most effective control for limiting blast radius when an MCP integration is compromised through prompt injection or supply chain attack. An agent that can only read cannot exfiltrate through write operations; an agent scoped to one directory cannot traverse to others.

3. Verify MCP server provenance and integrity#

Third-party MCP servers introduce supply chain risk. Before connecting any third-party server to a production agent, verify the source repository, confirm the maintainer identity, review the tool definitions for unexpected capabilities, and pin the server version. An MCP server that declares a tool called get_weather but also includes an undeclared tool that executes shell commands is not theoretical; Repello's research on malicious MCP skill packages has documented this pattern in the wild.

Apply the same scrutiny to MCP servers that you apply to third-party code dependencies. An unreviewed MCP server added to an agent is an unreviewed code dependency with execution access.

4. Restrict which agents can invoke which tools#

Not every agent in your environment should reach every MCP server. Define a tool access matrix: which agent identities (or agent roles) are permitted to invoke which servers. Enforce this at the MCP gateway layer, not through model-level instructions. Instructions to the model can be overridden through prompt injection; gateway-level access control cannot.

ARGUS, Repello's runtime security layer, enforces tool call policy at the gateway, blocking unauthorized tool invocations regardless of what the model's context window contains at the time of the call.

Phase 2: At the integration boundary#

5. Validate and sanitize all tool inputs before execution#

Every parameter the model passes to an MCP tool should be validated against an expected schema before execution. Type checking is insufficient; validate the semantic range of each parameter. A file path parameter should be validated against an allowlist of permitted directories. A query parameter should be validated against a pattern that excludes injection syntax. An API parameter that accepts free text should be sanitized before it reaches the downstream system.

The OWASP LLM Top 10 (2025) identifies improper output handling (LLM02) as a top risk for agentic deployments precisely because model-generated tool inputs travel through integration layers that were not designed to receive adversarial content. Validation at the MCP layer is the correct interception point.

6. Inspect tool response content for prompt injection payloads#

Tool responses are the primary attack vector for indirect prompt injection in MCP-integrated systems. When an MCP tool returns content from an external source (a web page, a database record, an API response, a file), that content travels into the model's context window. If it contains adversarial instructions, the model processes them alongside legitimate content.

Apply the same policy inspection to tool responses that you apply to user inputs. Classify tool response content for injection patterns before it reaches the model. Flag responses that contain instruction-like structures, role-framing assertions, or system prompt override attempts. Repello's MCP prompt injection analysis covers the full attack chain from malicious tool response to agent action hijack.

7. Sandbox tool execution environments#

Tools that execute code, run shell commands, or interact with the local file system should run in isolated execution environments with no network access to internal infrastructure and no filesystem access beyond a defined scope. Container-based sandboxing with explicit egress controls is the minimum viable isolation for any MCP tool with execution capability.

An MCP tool that can reach internal APIs or internal file systems from within an agent's execution environment is a pivot point. Sandbox boundaries enforce the principle that a compromised tool integration cannot become a credential or network access path.

8. Rate-limit tool calls to prevent resource abuse#

MCP-enabled agents can invoke tools in loops, either through legitimate multi-step reasoning or through adversarially induced behavior. Without rate limits, a single compromised session can exhaust API quotas, trigger billing spikes, or generate thousands of database queries. Apply per-session and per-tool rate limits that reflect normal operational patterns, and alert on sessions that approach the limit.

This control overlaps with Denial-of-Wallet protections: rate limiting is both a cost control and a signal that something abnormal is happening in the agent's tool call behavior.

Phase 3: In production#

9. Log all tool calls with full request and response audit trails#

Every MCP tool call should produce an audit log entry containing: the agent identity, the tool name, the complete input parameters, the complete response, the timestamp, and the session context. Logs should be append-only and tamper-evident.

Complete audit trails serve two functions: forensic reconstruction of exploitation chains after an incident, and the baseline against which anomaly detection runs. A tool call that does not match its logged parameter patterns from the previous 30 days is a high-signal anomaly. Without logs, neither function is possible.

10. Monitor agent outputs for behavioral anomalies indicating injection success#

Indirect prompt injection that bypasses input inspection succeeds silently: the model completes the injected instruction, and the output looks like a normal tool call or response from the agent's perspective. The only reliable detection signal is output behavior that does not match the stated user intent for the session.

Output monitoring should flag: tool calls to servers not in scope for the session's declared purpose, tool call sequences that match known exfiltration patterns (read credentials, then send to external endpoint), and responses that contain content inconsistent with the user's original request. The NIST AI Risk Management Framework (AI RMF 1.0) identifies continuous monitoring as a core Manage function for deployed AI systems with external tool access.

11. Red team MCP integrations specifically for indirect prompt injection#

Standard red team test suites test the model's user-turn interface. They do not test the tool integration layer. Before production deployment and on a continuous basis afterward, run injection probes specifically through tool response channels: craft tool responses that contain adversarial instructions and verify whether the agent acts on them. Test each MCP server as a separate injection surface.

"The tool response is not just data," says the Repello AI Research Team. "It is an instruction surface that most security teams have never probed."

Coverage should include: single-turn tool response injection, multi-turn injection sequences where the adversarial payload builds across several tool calls, and cross-tool injection chains where a compromised response from one tool influences calls to a second tool.

12. Run regression tests after every MCP server update#

MCP server updates change tool definitions, response formats, and the behaviors the agent can invoke. Each update is a potential regression in the security properties you validated at initial deployment. Maintain a test suite that covers all 11 controls above and run it automatically against every server version before the update reaches production.

ARTEMIS runs automated MCP security regression testing, including indirect prompt injection probes across all connected tool surfaces and coverage completeness reporting that maps which tool response paths have been tested and which remain uncovered.

Frequently asked questions#

What is the Model Context Protocol (MCP)?

MCP is Anthropic's open standard for connecting AI models to external tools, data sources, and services using a JSON-RPC 2.0 interface. It gives AI agents a standardized mechanism to call tools such as file system access, database queries, API integrations, and browser automation without requiring custom integration code for each capability. MCP servers expose tool definitions that describe the available functions, their parameters, and their expected outputs.

What is the biggest security risk in MCP deployments?

Indirect prompt injection through tool responses is the highest-priority risk for most deployments. When an MCP tool returns content from an external source (a web page, a file, an API response), that content enters the model's context window. An attacker who can write adversarial instructions into any content the tool retrieves can hijack the agent's behavior without any access to the model itself or the user-facing interface. A single poisoned document in a connected knowledge base, or a malicious web page that an MCP browser tool visits, is sufficient to execute the attack.

What is MCP tool poisoning?

MCP tool poisoning is an attack in which a malicious MCP server exposes tool definitions that misrepresent their actual behavior, or a legitimate server is compromised to add undeclared capabilities. Repello's research demonstrated a full exploitation chain from a poisoned tool definition to remote code execution. The attack exploits the fact that agents often cannot distinguish between a tool that does what its definition claims and one that performs additional undeclared actions alongside the declared function.

How does least privilege apply to MCP?

Least privilege for MCP means each tool receives only the permissions required for its specific, declared function. A file-reading tool gets read access to a defined path, not write access or access to other directories. A database query tool uses credentials scoped to the tables it needs, not admin credentials. Least privilege limits the blast radius of a compromised tool integration: an attacker who controls the tool's execution through prompt injection can only act within the permissions the tool already has.

Should I use model-level instructions to restrict MCP tool access?

No. Model-level instructions for tool access control are insufficient as a primary control because they can be overridden through prompt injection. If an attacker can inject instructions into the model's context (through a tool response, a poisoned RAG document, or any other indirect injection path), those instructions can override the model's policy instructions. Access control should be enforced at the MCP gateway layer, where it operates independently of the model's context window state.

How often should I red team MCP integrations?

MCP integrations should be red teamed before initial production deployment and after every update to any connected MCP server, agent framework version, or model version. The attack surface changes with each update: a new tool definition expands the injection surface, a model update changes how tool responses are processed, and a framework update may add new MCP connections. Continuous regression testing, rather than periodic assessments, is the only approach that keeps pace with the rate of change in MCP-integrated environments.