Back to all blogs

|
Mar 1, 2026
|
8 min read


-
Standard LLM pentesting covers the model layer. Agentic systems need 5 additional test tracks: indirect injection, tool call scope testing, memory poisoning, cross-agent propagation, and session persistence.
TL;DR
An agentic AI system is not a chatbot. It has persistent memory, calls external tools, and acts across multi-step pipelines. Standard LLM pentesting methodology covers one layer of that. A complete test covers all of them.
The dominant attack class is indirect prompt injection: an attacker embeds instructions in content the agent reads, and those instructions redirect tool calls to unauthorized endpoints or exfiltrate data through legitimate API channels.
Before testing, enumerate every tool, every permission scope, and every external system the agent can reach. The threat model determines which test tracks apply.
Five test tracks are specific to agentic deployments and have no equivalent in a standard LLM assessment: indirect injection through external content, tool call scope violations, memory poisoning, cross-agent instruction propagation, and session persistence attacks.
OWASP LLM06:2025 (Excessive Agency) and the OWASP Agentic Top 10 are the primary frameworks for scoping an agentic pentest.
Standard LLM pentesting methodology was built for a specific target: a model that takes input, generates output, and stops. That target barely exists in production AI anymore.
Enterprise AI deployments in 2026 are overwhelmingly agentic. The system takes an instruction, queries a knowledge base, calls a calendar API, reads an email thread, generates a response, and logs the result; all of this happens without returning control to a human between steps. The attack surface is not one model and one system prompt. It is every tool the agent can call, every external source it reads, every memory store it writes to, and every downstream system that acts on its outputs.
The general methodology for LLM pentesting includes scoping against the OWASP LLM Top 10, testing for prompt injection and system prompt leakage, and verifying output handling. This applies to agentic deployments too, but it covers only the first layer. Stopping there misses the attack categories that make agentic systems categorically more dangerous than single-model deployments. The CrowdStrike 2026 Global Threat Report documented more than 90 organizations compromised via prompt injection specifically targeting agentic AI systems, with attackers using the agent's own tool access to exfiltrate credentials through channels that appeared as normal application traffic.
This post covers what changes in the methodology when the target is an agentic system, and what five test tracks have no equivalent in a standard LLM pentest.
What changes when an LLM has tool access
A standard LLM pentest evaluates a bounded system: one model, one system prompt, one session. A successful attack means the model produces a harmful or unauthorized output.
An agentic system adds actors, persistence, and external reach. The model orchestrates actions across tools, reads content from external sources, writes to memory stores that persist across sessions, and may delegate subtasks to other agents in a multi-agent pipeline. A successful attack no longer means a bad output. It means unauthorized tool execution.
OWASP LLM06:2025 (Excessive Agency) identifies this as a top-ten risk: agents are frequently granted permissions beyond what their task requires, and a successful prompt injection in that context translates directly into unauthorized actions at whatever scope the agent can reach. The attack surface is not the model's vocabulary. It is the model's permission set.
Three structural changes drive the methodology shift.
The injection surface is anything the agent reads. In a chatbot, the primary attack surface is user input. In an agent, it is every external content source the agent processes: documents in a retrieval pipeline, web pages a browsing agent fetches, emails it reads, API responses it consumes, memory entries it retrieves. Indirect prompt injection, embedding attacker instructions in content the model reads, is the dominant attack class against agentic systems. Repello's security research on agentic AI browsers demonstrates exactly how this works: an attacker-controlled web page instructs the agent to exfiltrate data using the browser's fetch capabilities, and the agent complies because the instruction arrives through a channel the model treats as trusted content.
Tool calls are the impact. When a chatbot is compromised, the impact is what the model says. When an agent is compromised, the impact is what the model does: API calls it makes, files it reads, emails it sends, code it executes, external services it contacts. A single successful injection can trigger a chain of tool calls that exfiltrates data, modifies records, or reaches attacker-controlled infrastructure, all using legitimate application credentials.
Persistence extends the attack window. Agents with long-term memory stores carry state across sessions. An attacker who successfully writes a malicious instruction into an agent's memory has not compromised one session. They have potentially compromised every future session that retrieves that memory context.
The agentic attack surface: what to enumerate before testing
The threat model for an agentic pentest must be more detailed than for a standard LLM assessment. Enumerate these components before writing a single test case.
Tool inventory. Every external capability the agent can invoke: API integrations, database access, code execution environments, file system permissions, web browsing, email and calendar access, form submission, and external service connectors. Map the permission scope of each. An agent with read-only database access has a fundamentally different risk profile from one with write access.
External content sources. Every source of content the agent retrieves and processes: RAG knowledge bases, web pages, email inboxes, documents, API response payloads, and memory retrieval results. Each is a potential indirect injection vector.
Memory architecture. Whether the agent uses short-term context only, session-level memory, or persistent long-term memory storage. Persistent memory is a distinct attack target: a successful write propagates through future sessions.
Agent topology. Whether the deployment is a single agent or a multi-agent pipeline. If multiple agents communicate, the inter-agent communication channel is part of the attack surface. An instruction that enters through a low-trust agent and propagates to a high-trust agent that acts on it represents a privilege escalation path.
Output consumers. Every downstream system that receives agent outputs: databases, APIs, rendered UIs, other agents, human workflows that trust agent-generated content.
Five test tracks specific to agentic LLM pentesting
A complete agentic pentest includes all the standard LLM test tracks plus these five, which have no direct equivalent in a single-model assessment.
1. Indirect prompt injection through external content. Craft adversarial documents, web pages, email bodies, and API response payloads designed to redirect agent behavior. Test whether the agent executes instructions embedded in retrieved content, particularly when those instructions claim elevated authority or impersonate system-level sources. Test across every external content source in scope. Repello's research on MCP tool poisoning to RCE illustrates the concrete impact: a malicious tool definition alone can redirect agent execution to attacker-controlled infrastructure without any direct user interaction.
2. Tool call scope testing. For each tool in scope, test whether the agent can be induced to call it outside its intended parameters: accessing unauthorized data ranges, calling endpoints outside the approved list, escalating from read to write operations, or exfiltrating data through a tool that appears legitimate in the audit log. The OWASP Agentic Top 10 classifies tool scope violation as a top agentic risk alongside prompt injection and memory attacks.
3. Memory poisoning. For agents with persistent memory, test whether an attacker can write malicious instructions into the memory store that persist across sessions and alter future agent behavior. This requires access to a session that reaches the memory write path, either through normal interaction or through an injection that triggers a memory write as a side effect. The goal is to plant an instruction that surfaces reliably in future context retrievals.
4. Cross-agent instruction propagation. In multi-agent architectures, test whether a malicious instruction entering through a low-privilege agent propagates to higher-privilege agents downstream. The mechanism is indirect injection, but the vector is inter-agent communication rather than external content. A subagent that summarizes retrieved content and passes those summaries to an orchestrator is a propagation path if the summarizer does not sanitize injected instructions from its content sources.
5. Session persistence attacks. Test whether conversation history, cached tool outputs, or session context from a compromised interaction carries malicious state into subsequent sessions for the same or different users. This requires explicitly testing the session isolation boundaries: what persists between sessions, what resets, and what is shared across user contexts in multi-tenant deployments.
What complete agentic pentesting looks like in practice
The methodology extends the standard LLM pentesting approach rather than replacing it. Scope the threat model first using the enumeration steps above. Apply all standard OWASP LLM Top 10 test tracks to the model interaction layer. Then add the five agentic-specific tracks as a distinct test phase.
Testing agentic systems at the API layer only captures part of the attack surface. An agent that browses the web, processes email, or executes multi-step tool chains needs to be tested as it actually runs: through the real execution environment, with real tool access, across real multi-step task sequences. API-level testing shows whether the model produces bad outputs given adversarial inputs. It does not show whether the agent executes unauthorized tool calls in response to adversarially crafted content arriving through its normal retrieval paths.
This is where the scope gap between a standard LLM pentest and a complete agentic assessment becomes most consequential. MITRE ATLAS currently documents 15 adversarial tactics and 66 techniques specific to AI systems, the majority of which have no direct equivalent in the ATT&CK framework for traditional IT systems. Agentic-specific techniques, including tool misuse, context manipulation, and indirect prompt injection, represent a growing share of that catalog.
Repello's ARTEMIS runs these tests by operating through the actual application environment: navigating browsing agents through adversarially crafted web content, injecting malicious payloads into retrieval pipelines, and tracing tool call sequences end-to-end to identify unauthorized execution paths. Findings map to specific OWASP LLM Top 10 and OWASP Agentic Top 10 categories and feed directly into runtime protection via ARGUS, which enforces adaptive guardrails at the inference layer in under 100 milliseconds.
The combination matters: a pentest without runtime protection leaves production deployments exposed to novel attack patterns not covered in the test scope. Runtime protection without regular pentesting has no visibility into exploitable weaknesses until an attack succeeds.
Frequently asked questions
What is the difference between LLM pentesting and agentic AI pentesting?
LLM pentesting evaluates a model and its direct interaction layer: system prompt integrity, guardrail effectiveness, output handling, and direct injection via user input. Agentic AI pentesting extends that scope to include every tool the agent can invoke, every external content source it reads, every memory store it writes to, and every downstream system that acts on its outputs. Five test tracks are specific to agentic deployments and have no equivalent in a standard LLM assessment: indirect injection through external content, tool call scope violations, memory poisoning, cross-agent instruction propagation, and session persistence attacks.
How do you scope an agentic pentest?
Start by enumerating every component of the agentic system: the model, system prompt, tool integrations and their permission scopes, external content sources, memory architecture, agent topology (single vs. multi-agent), and output consumers. Map each component to applicable OWASP LLM Top 10 and OWASP Agentic Top 10 categories. That mapping determines which test tracks are in scope and at what priority.
What frameworks apply to agentic AI pentesting?
The primary frameworks are OWASP LLM Top 10 2025 (especially LLM06 Excessive Agency and LLM01 Prompt Injection), the OWASP Agentic Top 10, and MITRE ATLAS. NIST AI 600-1, the NIST Generative AI Profile released in July 2024, frames continuous adversarial testing as a required component of responsible AI deployment and applies directly to agentic systems operating at enterprise scale.
Can automated tools run agentic pentests?
Automated tools handle repeatable coverage well: running known indirect injection payloads across all content ingestion paths, checking tool call scope boundaries systematically, and verifying session isolation across user contexts. They do not handle novel multi-step attack chains or context-dependent manipulation across complex multi-agent topologies well. The practical approach combines automated tools for breadth and consistency with skilled testers for depth on complex, chained attack paths.
What makes agentic AI systems a higher-value target than standard LLM deployments?
An agentic system holds the combined permissions of every tool it integrates with. A successful prompt injection does not produce one unauthorized output; it can trigger a chain of tool calls that reads sensitive data, modifies records, contacts external services, or exfiltrates information through channels that appear as legitimate application traffic in audit logs. The CrowdStrike 2026 Global Threat Report documented more than 90 organizations compromised through exactly this pattern.
Conclusion
Agentic AI systems are not a harder version of the chatbot security problem. They are a structurally different one. The attack surface includes everything the agent reads, everything it can execute, and every system it connects to. Standard LLM pentesting methodology addresses the model interaction layer. A complete agentic pentest adds five test tracks that have no equivalent in that methodology: indirect injection through external content, tool call scope violations, memory poisoning, cross-agent instruction propagation, and session persistence attacks.
Security teams running agentic AI in production that have not explicitly tested these tracks have incomplete visibility into their exposure. As agentic deployments expand and tool integrations multiply, the gap between a standard LLM pentesting assessment and a complete agentic evaluation will only grow.
To learn how Repello tests agentic AI deployments in production, visit repello.ai/product or request a demo.
Share this blog
Subscribe to our newsletter









