What does AI red teaming actually test?

AI red teaming tests prompt injection (direct and indirect), jailbreaking and persona attacks, RAG poisoning, agentic attacks including MCP tool poisoning and cross-agent injection, denial of wallet attacks, and data exfiltration via tool calls. Effective AI red teaming is context-specific: attacks are calibrated to the target application's function and risk profile, not applied uniformly from a taxonomy.

Red teaming vs. penetration testing vs. vulnerability scanning: what AI security teams actually need

Q: Is red teaming the same as penetration testing?

No. Penetration testing asks 'can we gain unauthorized access?' within a bounded scope. Red teaming asks 'can we achieve this objective?' with fewer constraints on method. For AI systems, the distinction matters further: AI red teaming requires an entirely different attack taxonomy than both traditional red teaming and traditional pentesting, covering prompt injection, jailbreaking, RAG poisoning, and agentic attacks.

Q: What is breach and attack simulation (BAS)?

Breach and attack simulation (BAS) runs automated, continuous simulations of attacker TTPs from MITRE ATT&CK against production security controls. It validates that infrastructure defenses work continuously rather than at a point in time. Current BAS platforms are built for infrastructure security, not AI application security. Testing AI systems requires coverage of MITRE ATLAS and OWASP LLM Top 10 attack classes, which BAS tools do not currently provide at scale.

TL;DR

Vulnerability scanning finds known CVEs automatically. It produces no signal against prompt injection, jailbreaks, or model-specific attacks.
Penetration testing is manual and point-in-time. It was not designed for probabilistic AI systems.
Red teaming is objectives-based adversarial simulation. Applied to AI, it requires a completely different attack taxonomy than infrastructure red teaming.
Breach and attack simulation (BAS) automates continuous TTP simulation. It is built for infrastructure, not AI.
AI red teaming is the only methodology purpose-built for the LLM and agentic attack surface.

Security teams tasked with securing AI often reach for familiar tools: vulnerability scanners, pentest engagements, red team exercises. These are mature disciplines with decades of methodology behind them. They are also not designed for the AI attack surface.

Prompt injection does not generate a CVE. A jailbreak cannot be found by a port scan. And a probabilistic model that refuses a harmful request 80% of the time will pass a point-in-time pentest if the tester happens to run it during that 80%. This guide defines each methodology clearly, identifies where each breaks down for AI systems, and explains what AI red teaming actually requires.

What is vulnerability scanning?#

Vulnerability scanning is automated, passive enumeration of known security weaknesses. A scanner checks a system against a database of known CVEs and misconfigurations, reports what it finds, and produces a prioritized list of issues to remediate.

Scanners are fast and scalable. They are useful for maintaining hygiene across infrastructure: unpatched software, open ports, misconfigured services, exposed credentials. They run continuously without human involvement.

Their limitation is definitional: they only find what is in the database. A scanner cannot identify a novel attack path. Prompt injection, RLHF exploitation, persona attacks, and RAG poisoning do not have CVE entries. Vulnerability scanning produces no useful signal against the AI-specific attack surface.

What is penetration testing?#

Penetration testing is a manual, adversarial, point-in-time assessment. A pentester attempts to gain unauthorized access to a defined scope of systems, documents the exploitation path, and delivers a findings report with evidence. The engagement runs within a bounded scope: specific systems, a defined timeframe, and agreed rules of engagement.

Pentesting finds logic flaws, authentication bypasses, injection vulnerabilities, and access control failures that automated scanning misses. A skilled pentester brings creative adversarial judgment that no scanner can replicate.

The structural constraint is its point-in-time nature. A pentest is a snapshot. For AI penetration testing, this creates a specific problem: LLMs are probabilistic. A jailbreak that works 30% of the time across 1,000 attempts will frequently pass a manual single-pass test. Characterizing the attack surface of a probabilistic system requires statistical sampling, not single-pass assessment.

Traditional pentesting also assumes the testing team knows what "unauthorized access" means. For AI systems, defining a successful exploit requires a different threat model: a harmful response, a policy bypass, a data disclosure through tool calls. Most pentest teams do not have that framework.

What is red teaming?#

Red teaming originated in military and intelligence contexts as objectives-based adversarial simulation. Where penetration testing asks "can we get in?", red teaming asks "can we achieve this specific objective?" The objective might be: exfiltrate a target file, cause the system to take a harmful action, or demonstrate that a specific control fails.

In cybersecurity, red teaming typically involves a persistent authorized adversarial simulation that attacks people, processes, and technology simultaneously. The red team is less constrained than a pentester: it defines its own approach to achieve the stated objective. Engagements run longer, with higher operational complexity.

Applied to AI systems, red teaming requires a different attack taxonomy than infrastructure red teaming. Attacks against LLMs do not involve lateral movement through network segments; they involve multi-turn prompt manipulation, context injection, and exploitation of the model's instruction-following behavior. A generic red team skilled in network exploitation does not have the attack vocabulary for prompt injection, RLHF bypass, or agentic system compromise. The further wrinkle is that "AI red teaming" is itself often used as a synonym for the narrower activity of adversarial probe batteries against a single model — the bounded-vs-goal-directed-vs-contractual taxonomy separates the three so a CISO asking for one of them gets the right scope back.

What is breach and attack simulation?#

Breach and attack simulation (BAS) automates continuous simulation of attacker TTPs from the MITRE ATT&CK framework. BAS platforms run simulated attacks against production security controls around the clock, validating whether defenses actually work rather than assuming they do. Where a pentest validates a snapshot, BAS validates continuously.

BAS is effective for infrastructure security: testing whether endpoint detection catches a specific technique, whether a SIEM rule fires correctly, whether egress filtering blocks a simulated exfiltration.

The limitation for AI security is structural: MITRE ATT&CK covers infrastructure attack techniques. MITRE ATLAS was created separately to cover adversarial ML and AI-specific attack techniques. BAS platforms have not incorporated ATLAS-based attack playbooks at scale. BAS cannot test whether a production LLM is susceptible to prompt injection or whether a deployed agent can be hijacked through its tool-call layer.

Why none of these were built for AI systems#

Each methodology breaks down against the AI attack surface for a predictable reason.

Vulnerability scanning requires a CVE database. AI vulnerabilities — prompt injection susceptibility, RLHF exploitation, model-specific jailbreaks — do not generate CVEs. There is no signature for "this model will comply with a persona attack."

Penetration testing assumes a deterministic target. LLMs are probabilistic: the same prompt produces different outputs across runs. Research from the University of Illinois Urbana-Champaign showed AI agents successfully exploited 87% of real-world CVEs when given access to tool-call capabilities. A point-in-time test run before that capability was added would have missed the entire attack surface it opened.

"Traditional security testing was designed for deterministic systems," according to the Repello AI Research Team. "LLMs are probabilistic. A jailbreak might succeed on 3 out of 10 attempts. A single-pass pentest misses it 70% of the time. Statistical sampling across hundreds of attack variations is the only way to measure actual risk."

Red teaming works conceptually but requires AI-specific expertise. The OWASP LLM Top 10 defines a different attack taxonomy than traditional red team training covers: prompt injection, indirect injection through external data sources, jailbreaking, model denial of service, sensitive information disclosure. Generic red teams apply infrastructure techniques that do not transfer.

BAS is the closest to a continuous testing approach in principle, but the tooling has not caught up. Current platforms test against MITRE ATT&CK. Testing against MITRE ATLAS at scale requires purpose-built AI testing infrastructure that most BAS vendors do not offer.

The deeper issue is that all four methodologies were designed for systems that produce predictable outputs given deterministic inputs. AI systems do not. An AI application's attack surface changes with every model update, system prompt change, and new tool integration. Continuous AI red teaming, not point-in-time assessment, is the only methodology that matches the pace of change.

What AI red teaming actually involves#

AI red teaming covers the attack surface that traditional methodologies leave untested:

Prompt injection: direct injection through user inputs; indirect injection through documents, emails, web pages, and tool responses
Jailbreaking and persona attacks: RLHF exploitation, DAN-style persona reframing, multi-turn safety training erosion
RAG poisoning: adversarial content injected into the retrieval corpus to steer model outputs
Agentic attacks: tool-call hijacking, cross-agent instruction injection, MCP tool poisoning, agent identity abuse
Denial of wallet: input crafting to maximize token consumption and inference cost
Data exfiltration via tool calls: agent manipulation to retrieve and transmit sensitive data through legitimate integrations

Effective AI red teaming is context-specific. A fraud detection model, a customer service chatbot, and a coding assistant face different adversarial conditions. Applying a uniform taxonomy without calibrating to the target application's function, risk profile, and deployment configuration produces incomplete coverage.

Output should include exploitability evidence (not just "vulnerable in theory"), coverage mapped to OWASP LLM Top 10 and MITRE ATLAS, and prioritized remediation steps a security team can act on.

Comparison at a glance#

	Vuln scanning	Pentesting	Red teaming	BAS	AI red teaming
Automated	✅	❌	❌ / hybrid	✅	✅
Continuous	✅	❌	❌	✅	✅
AI-specific coverage	❌	❌	Requires AI expertise	❌	✅
Handles probabilistic systems	❌	❌	❌	❌	✅
Novel attack discovery	❌	✅	✅	❌	✅
Framework	NVD / CVE	Varies	Varies	MITRE ATT&CK	OWASP LLM Top 10 + MITRE ATLAS
Output	CVE list	Findings report	Objectives assessment	Control gaps	AI risk report + remediation

Which approach does your AI security program need?#

Vulnerability scanning is infrastructure hygiene, not an AI security program. Run it as a baseline. It is a prerequisite for everything else, not a substitute for it.

Penetration testing is useful for a one-time adversarial assessment before a new AI application goes live. Verify that the team has LLM-specific attack knowledge before engaging. A generic pentest against an LLM application will miss the majority of the actual attack surface.

Ongoing red teaming is necessary for any AI application that handles sensitive data, makes consequential decisions, or is exposed to adversarial users. The quantified case for this is in Repello's analysis of continuous red teaming: mean time to exploit has collapsed from 771 days to under 4 hours for newly published vulnerabilities. Point-in-time testing cannot keep pace with that.

BAS belongs in the infrastructure security stack. It validates that your controls work as configured. It does not replace AI-specific testing.

AI red teaming is necessary for any deployment of LLMs, agents, or AI systems that process untrusted inputs. This is the methodology built for the threat model your AI systems actually face.

Frequently asked questions#

Is red teaming the same as penetration testing?

No. Penetration testing asks "can we gain unauthorized access?" within a bounded scope. Red teaming asks "can we achieve this objective?" with fewer constraints on method. Red teaming typically runs longer, involves broader scope, and tests people and processes alongside technology. For AI systems, the distinction matters further: AI red teaming requires an entirely different attack taxonomy than both traditional red teaming and traditional pentesting.

What is breach and attack simulation (BAS)?

BAS runs automated, continuous simulations of attacker TTPs from MITRE ATT&CK against production security controls. It validates that infrastructure defenses work continuously. Current BAS platforms are built for infrastructure, not AI application security. The AI-equivalent would be continuous automated red teaming against MITRE ATLAS and OWASP LLM Top 10 attack classes.

Do AI systems need a different type of penetration testing?

Yes. Traditional pentesters test deterministic application logic. LLMs are probabilistic and require statistical sampling across many attack variations, not single-pass testing. They also face an attack taxonomy — prompt injection, jailbreaking, RAG poisoning, agentic attacks — that is not covered by traditional pentest methodology. Teams running an AI pentest should verify the testing team has specific LLM adversarial expertise before engaging.

How often should AI systems be red teamed?

After every significant model update, system prompt change, or new tool integration — not on a fixed schedule. The attack surface of an AI application changes every time the underlying model or configuration changes. Continuous automated red teaming provides assurance between point-in-time assessments and catches regressions that quarterly engagements miss.

What is the difference between AI red teaming and AI safety testing?

AI safety testing evaluates whether a model produces harmful outputs under normal use. AI red teaming actively attempts to manipulate the model into harmful outputs through adversarial inputs. Safety testing finds natural failure modes; red teaming finds exploitable ones. Both are necessary, and the findings from red teaming should feed back into safety testing coverage.

Test your AI application with ARTEMIS#

ARTEMIS is AI red teaming built for the AI attack surface: 15 million+ evolving attack patterns, context-specific to your application, covering OWASP LLM Top 10, NIST AI RMF, and MITRE ATLAS. Automated and continuous so coverage keeps pace with model updates. Get a demo.