What is Tool Abuse in AI Agents?
Tool abuse is the attack pattern where an AI agent uses the tools it has access to — file operations, API calls, code execution, email sending, payment processing — for purposes the operator did not authorize. Where prompt injection corrupts what the model says, tool abuse corrupts what the model does. It is the agentic-AI failure mode with the largest blast radius because the consequences extend beyond the conversation into real systems.
How tool abuse happens
Three mechanisms, often chained:
-
Prompt-injection-driven tool calls. The model is hijacked (directly or via retrieved content) into calling tools the user didn't intend. Repello's Zapier exploit chained an indirect prompt injection in an inbound email with the Gmail-write tool to exfiltrate data.
-
Excessive agency. The model has access to more tools, with broader scopes, than the task actually requires. A customer-service agent that can also wire money is a customer-service agent that can also be social-engineered into wiring money.
-
Confused-deputy patterns. The agent acts on behalf of the user with the user's privileges, but takes those actions in response to instructions from a third party (the document the user asked it to summarize, the email it's responding to). The agent has authority but isn't the entity that decided to use it.
Documented tool abuse incidents
- Zapier Gmail auto-reply exfiltration (Repello research) — a poisoned inbound email caused the agent to forward sensitive data via the agent's email-send capability
- Claude for Chrome access token leak (Repello research) — task injection coerced the agent's browser tools into exfiltrating tokens
- MCP tool chains to RCE (Repello research) — tool-poisoning combined with shell-execution tools achieved remote code execution on Docker hosts
- GitHub Copilot prompt-injection-to-pull-request — Copilot agents tricked into making PRs containing attacker code
- ChatGPT MCP connector zero-click exfil (Repello research) — single user confirmation of a malicious document triggered subsequent zero-click exfiltration via connected apps
Why tool abuse is the agentic blast-radius problem
A non-agentic LLM application's worst case is "the model says something bad to the user." An agentic application's worst case is "the model takes a bad action against connected systems." The set of bad actions is the union of every tool's capabilities. As tool counts grow (production agents commonly have 10-50 tools), the worst case grows accordingly.
Defending against tool abuse
The OWASP Agentic AI Top 10 codifies the relevant patterns. Practical defenses:
- Principle of least privilege per tool. Each tool's authorization scope should be the minimum required for legitimate use, not a broad permission.
- Confirmation gates on high-impact actions. Sending money, deleting data, posting to public channels — require explicit user approval, not just the model's say-so.
- Per-tool input/output filtering. Inspect tool arguments and responses at the gateway layer; flag patterns inconsistent with the deployment's intent.
- Audit every tool call. Complete trace of (user input that led to call, tool name, arguments, response) for forensic analysis.
- Threat-model the tool surface before deployment. Repello's Agent Wiz produces threat models for agentic deployments specifically to surface these risks before they ship.