Glossary/AI Red Teaming

What is AI Red Teaming?

AI red teaming is the systematic adversarial testing of AI systems — large language models, AI agents, RAG pipelines, and the applications built on top of them — to identify exploitable vulnerabilities before attackers do. It is the structured practice of attacking your own AI deployment, with the goal of producing empirical findings about what an attacker could actually accomplish.

How AI red teaming differs from traditional red teaming

Traditional red teaming probes networks, applications, and human processes for deterministic vulnerabilities mapped to CVEs and patched with code changes. AI red teaming probes:

The attack surface is probabilistic, the failure modes are statistical, and the patches are typically system-prompt changes, guardrail deployments, or fine-tuning runs rather than discrete code patches.

The five-step methodology

A structured AI red team engagement typically follows:

  1. Scope — define the system under test, the threat model, and the attack categories to cover (commonly the OWASP LLM Top 10 + agentic threats)
  2. Plan — translate threat categories into specific attack scenarios with measurable success criteria
  3. Execute — run automated probes plus manual creative attacks against the deployment
  4. Score — measure attack success rate (ASR) per category, identify which controls held and which failed
  5. Remediate + retest — apply fixes, re-run the same probe set, confirm the attacks no longer succeed

What gets tested

Coverage typically maps to standard taxonomies:

Manual vs. automated

A mature program runs automated coverage continuously and supplements with periodic manual exercises when the deployment architecture changes meaningfully.