Back to all blogs

What Is LLM Pentesting? A Practical Guide for Security Teams

Archisman Pal

Head of GTM

Feb 23, 2026

12 min read

What Is LLM Pentesting? A Practical Guide for Security Teams

Repello tech background with grid pattern symbolizing AI security

Security researchers from the UK AI Safety Institute ran 1.8 million attacks across 22 frontier AI models as part of a structured red team exercise. Every single model broke. Not some of them. Every one. The exercise tested models across a range of harmful output categories, and none of the systems held up under sustained, well-resourced adversarial pressure. That result is not an indictment of any particular vendor. It is a statement about the current state of LLM security as a discipline: the attacks are real, they work, and most organizations deploying LLMs are not running anything close to structured adversarial testing against them.

LLM pentesting is the practice that closes this gap. It applies structured adversarial pressure to LLM-powered applications to identify vulnerabilities before attackers do. This guide explains what LLM pentesting covers, how it differs from traditional application security testing, and what a practical methodology looks like for security teams evaluating their first LLM deployment or their tenth.

Key takeaways

LLM pentesting covers the full application stack: the model, system prompt, retrieval pipeline, tool integrations, and session management.
The OWASP LLM Top 10 2025 is the primary threat catalog. Prompt injection holds the top spot for the second consecutive year.
LLM systems are probabilistic and stateful, which requires multi-turn testing and conversation-history manipulation as explicit test tracks.
API-level testing alone misses a significant category of vulnerabilities that only surface through the application UI, file upload paths, and agentic tool chains.
A structured LLM pentest maps findings to the OWASP and MITRE ATLAS frameworks and feeds results into runtime protection.

What LLM pentesting actually is

LLM pentesting is a category of security testing focused on identifying vulnerabilities in applications built on or around large language models. That scope is broader than testing the model itself. A production LLM application typically includes the model, a system prompt, a retrieval pipeline, one or more tool integrations, user session management, and an application layer that handles input and output. The attack surface spans all of these components.

A traditional penetration test evaluates a system against known vulnerability classes: SQL injection, broken authentication, insecure deserialization, and so on. LLM pentesting evaluates the same system against a different and largely non-overlapping set of vulnerabilities: prompt injection, system prompt leakage, jailbreaking, RAG exfiltration, agentic tool abuse, and unsafe output handling. The OWASP Top 10 for LLM Applications 2025 is the most widely used reference for this threat catalog, and it has grown substantially from the 2023 edition. Five entirely new vulnerability categories appeared in the 2025 update, including excessive agency, system prompt leakage, and vector and embedding weaknesses, each representing failure modes specific to generative AI architectures.

What makes LLM pentesting structurally different from traditional application testing is the non-determinism of the target. Classical applications are deterministic: the same input always produces the same output. LLMs are probabilistic: the same prompt can generate different responses depending on model temperature, context window contents, prior conversation turns, and the state of the retrieval pipeline at query time. That non-determinism has direct implications for how test cases are written, how many iterations constitute a valid test, and how findings are interpreted.

How LLM pentesting differs from traditional application pentesting

The core principles of application security still apply to LLM systems. Input validation, authentication boundary testing, API security assessment, and output handling verification all matter. But several things are structurally different.

The attack surface includes natural language. In a traditional web application, inputs are structured: form fields, API parameters, JSON payloads. In an LLM application, the primary input is free text, and that text can carry instructions as well as data. Prompt injection exploits exactly this conflation. The model cannot reliably distinguish between an instruction from the system operator and an instruction embedded in user-supplied content or a retrieved document. OWASP LLM01:2025 covers both direct and indirect variants. Indirect prompt injection through RAG pipelines is considered the more dangerous of the two because the attack surface is anything the model reads, not just what the user types. Repello's catalog of real-world prompt injection attack examples illustrates how diverse these attack paths are in production deployments.

Multi-turn conversation is a distinct test track. A single-request test misses a category of attacks that only manifest across conversation history. Many jailbreaking techniques work by gradually shifting the model's framing across multiple turns, sometimes called "many-shot" manipulation. Repello's deep-dive on AI jailbreaking techniques covers the full range of these approaches, including how attackers layer role-play scenarios and hypothetical framings across conversation turns to erode safety controls incrementally. Test suites that fire single requests and inspect responses will not catch these.

Tool integrations extend the blast radius. LLM applications with the ability to call external APIs, execute code, browse the web, or interact with file systems have a much larger attack surface than model-only deployments. OWASP LLM06:2025 (Excessive Agency) classifies this as a top-ten risk because a successful prompt injection in an agentic pipeline can trigger unauthorized transactions, exfiltrate data to external endpoints, or escalate permissions through a chain of individually plausible tool calls.

Output handling is part of the attack surface. If LLM outputs are rendered downstream, passed to other systems, or used to trigger actions, the content of those outputs becomes a security concern in its own right. OWASP LLM05:2025 (Improper Output Handling) addresses scenarios where unsanitized model outputs reach SQL interpreters, HTML renderers, or command execution environments, creating secondary injection paths.

MITRE ATLAS, which catalogs adversarial tactics and techniques against AI systems, currently documents 15 tactics and 66 techniques specific to AI attacks, the majority of which have no direct equivalent in the ATT&CK framework for traditional IT systems. A complete LLM pentest draws on both frameworks.

The threat landscape you're testing against

Before scoping any test, security teams need a clear map of what they are testing for. The OWASP LLM Top 10 2025 provides the most actionable organizing framework. Repello's two-part OWASP LLM Top 10 series for CISOs walks through each category in detail with mitigation guidance, and is worth reading before scoping a first engagement. For most enterprise LLM deployments, the highest-priority test categories are:

Prompt injection (LLM01): Can an attacker override the system prompt through user input or through content in a retrieved document? Both direct and indirect variants require test coverage.

Sensitive information disclosure (LLM02): Can the model be induced to reveal training data, system prompt contents, or data from other users' sessions? A 2025 industry report found that 77% of enterprise employees who use AI tools have pasted company data into a chatbot query, with 22% of those instances involving confidential personal or financial data. In many deployments, those employees are operating through enterprise LLM systems with access to internal knowledge bases, making this a high-priority test track.

System prompt leakage (LLM07): Is the system prompt, which may contain business logic, API credentials, or confidential operator instructions, extractable through adversarial prompting? This is a common finding in first-generation LLM deployments and often underestimated in its downstream impact.

Vector and embedding weaknesses (LLM08): For RAG-based applications, can an attacker influence what documents are retrieved, or reconstruct sensitive training data from embedding outputs? This was added as a new category in the 2025 update, reflecting the widespread adoption of RAG architectures in enterprise AI.

Excessive agency (LLM06): For agentic systems, does the scope of what the model can do match what it needs to do? Pentesting agentic deployments requires explicitly enumerating every tool, every permission, and every external system the model can reach, and then testing each of those paths for potential abuse.

NIST AI 600-1, the NIST Generative AI Profile released in July 2024, identifies twelve risks specific to generative AI and provides a structured mitigation posture for each. It is the regulatory anchor for organizations operating under NIST guidance, and it explicitly frames red teaming and adversarial testing as required components of responsible AI deployment.

A practical LLM pentesting methodology

A structured LLM pentest follows a sequence similar in principle to a traditional penetration test, adapted for the specific characteristics of LLM systems. For a foundational overview of how AI red teaming differs from conventional security exercises, Repello's guide to AI red teaming is a useful starting point before diving into the steps below.

Scope and threat model first. Before writing a single test case, document the full application architecture: the model in use, the system prompt, every retrieval source, every tool the model can call, user roles and access levels, and the sensitivity tier of every data source the model can reach. The threat model determines which OWASP and MITRE ATLAS categories apply to this specific deployment. A read-only customer support chatbot has a very different threat model from an agentic assistant with access to CRM data, email, and calendar.

Build test cases from the threat model. For each applicable risk category, develop concrete, executable test cases. Prompt injection tests should cover both malicious user inputs and adversarially crafted documents that could enter the retrieval pipeline. Jailbreaking tests should cover role-play framings, hypothetical framings, and multi-turn conversation manipulation. Excessive agency tests should enumerate every tool call path and test each for scope violations. Repello's LLM pentesting checklist provides a structured starting point for each major risk category and maps test cases to their OWASP classification.

Test at multiple layers. API-level testing covers the model interaction layer: query the endpoint directly, inspect responses, and verify that guardrails fire on known attack patterns. Then test through the actual application UI, because the behavior of the full system often differs from the behavior of the underlying API in security-relevant ways. File upload paths, multi-turn conversation state, and UI-specific input handling all create test scenarios that API-only testing misses entirely.

Test multi-turn scenarios as a distinct track. Many-shot jailbreaking, gradual instruction override, and conversation history manipulation require a dedicated test methodology. They cannot be covered by single-request testing. Plan for multi-step attack sequences with explicit conversation state management.

Document and prioritize findings. Map findings to risk tiers: critical (direct data exfiltration or unauthorized action execution), high (guardrail bypass enabling harmful outputs), medium (information leakage of non-critical data), low (policy deviation without direct harm). Each finding should include the exact test case, the model response, and a clear remediation recommendation.

Why API-level testing is not a complete LLM pentest

Most LLM security testing, whether manual or automated, operates at the API layer. The test fires a prompt at an endpoint and inspects the response. This is the right starting point, but it is not a complete test.

The application layer introduces security behaviors the API layer does not expose. The way a web application renders LLM outputs, handles file uploads, manages session state, or enforces user role restrictions can materially change what attacks are possible. A prompt that fails against the raw API may succeed when it arrives through the application UI with an active user session attached. A file upload that appears harmless at the API level may trigger a prompt injection when the model processes the file's contents.

Agentic pipelines compound this further. An agent that can browse the web, execute code, or submit forms presents attack paths that require simulating the full execution environment, not just the model endpoint. Repello's research into zero-click exfiltration via MCP tool access in 11.ai shows exactly how this plays out in practice: a single malicious calendar invite can trigger unauthorized data exfiltration across connected tools without any direct user interaction. Testing an agentic system at the API level only is like running a web application pentest by querying the database directly: you are testing one component, not the system.

This is the gap that platforms like Repello address through their ARTEMIS testing framework. ARTEMIS browser mode navigates LLM applications exactly as a human attacker would: through the UI, across multi-turn sessions, through file upload paths, and through agentic tool chains that are structurally invisible to endpoint-only testing. Findings from those tests feed directly into ARGUS, Repello's runtime protection layer, which enforces adaptive guardrails in under 100 milliseconds without noticeable latency. The loop between testing and production protection closes: the pentest identifies what to guard against, and the guardrails enforce it at runtime.

Frequently asked questions about LLM pentesting

What is LLM pentesting?

LLM pentesting is structured adversarial security testing of applications built on large language models. It evaluates the model, system prompt, retrieval pipeline, tool integrations, and session management layer against the threat categories defined in the OWASP LLM Top 10 2025 and MITRE ATLAS. The goal is to identify exploitable vulnerabilities, including prompt injection, system prompt leakage, RAG exfiltration, and agentic tool abuse, before attackers do.

How is LLM pentesting different from a traditional application pentest?

Traditional application pentesting tests deterministic systems against a known set of vulnerability classes (SQL injection, broken authentication, etc.). LLM pentesting targets a probabilistic system with a natural-language attack surface. The same prompt can produce different results on different runs, multi-turn conversation history is part of the attack surface, and tool integrations create chained exploitation paths that have no equivalent in traditional web application testing. Testing methodology, tooling, and result interpretation all differ significantly.

How often should an LLM application be pentested?

At minimum, LLM applications should be pentested at initial deployment and after any significant change to the model, system prompt, retrieval sources, or tool integrations. Given how rapidly the threat landscape evolves, security teams at higher-risk organizations (financial services, healthcare, government) typically conduct quarterly testing and integrate automated adversarial test suites into their CI/CD pipeline so that every model update triggers a baseline test run. NIST AI 600-1 frames continuous testing as a core component of responsible generative AI deployment.

What does an LLM pentest report include?

A well-structured LLM pentest report maps every finding to a risk tier (critical, high, medium, low), includes the exact test case that triggered the vulnerability, shows the model's actual response, and provides a concrete remediation recommendation. Findings should be mapped to the relevant OWASP LLM Top 10 category and, where applicable, the corresponding MITRE ATLAS technique. The report should also distinguish between vulnerabilities in the model interaction layer (testable via API) and vulnerabilities that only surface through the application UI or multi-turn session state.

Can automated tools replace manual LLM pentesting?

Automated tools handle high-volume, repeatable tests well: running thousands of known prompt injection patterns, checking for system prompt leakage across input variations, and verifying that guardrails fire on expected attack signatures. They do not handle novel attack chains, context-dependent multi-turn manipulation, or agentic pipeline abuse well. The current consensus among practitioners is that automated tools and manual testing are complementary. Automation provides breadth and consistency; skilled human testers provide depth on complex, chained attack paths that require judgment and context.

Conclusion

LLM pentesting is not a specialized niche for AI researchers. It is standard security practice for any organization running LLM-powered applications in production. The threat landscape is well-documented across OWASP, MITRE ATLAS, and NIST AI 600-1. The attack techniques are real and actively used. What has lagged is operationalization: building the methodology, scoping it correctly, testing at all the right layers, and translating findings into runtime protection. Security teams that approach LLM pentesting as a structured, repeatable discipline rather than a one-off exercise will be significantly better positioned as agentic AI systems expand the attack surface further.

Learn more about how Repello approaches LLM security testing at repello.ai.

Share this blog

Subscribe to our newsletter

Introducing ARGUS: Runtime Security Layer for your GenAI systems

Jun 19, 2025

6 min read

BIG NEWS: Repello AI Raises $1.2M to Secure the future of AI 🚀

Jun 16, 2025

9 min read

Introducing ARTEMIS: Automated Red Teaming to Secure your AI applications

Mar 18, 2025

5 min read

Sign up for Repello updates
Subscribe to our newsletter to receive the latest insights on AI security, red teaming research, and product updates in your inbox.

Subscribe to our newsletter

8 The Green, Ste A
Dover, DE 19901, United States of America

contact@repello.ai

Sign up for Repello updates
Subscribe to our newsletter to receive the latest insights on AI security, red teaming research, and product updates in your inbox.