What are the 4 phases of an AI security testing program?

The four phases are: (1) Scoping and attack surface mapping, which documents every AI asset and enumerates external interfaces mapped to MITRE ATLAS techniques; (2) Adversarial attack execution, which runs statistical adversarial sampling across each attack surface; (3) Findings triage and prioritization, which scores findings on exploitability and impact and maps them to OWASP LLM Top 10 and ATLAS; (4) Remediation and regression testing, which validates fixes and runs a regression suite to confirm the attack vector no longer succeeds.

What metrics should an AI security testing program track?

Four metrics give a complete picture: attack success rate (ASR) per ATLAS technique category, regression delta (ASR change between consecutive test runs), coverage completeness (fraction of in-scope techniques tested), and mean time to remediation (MTTR). ASR tracks current exposure; regression delta catches new regressions; coverage completeness reveals testing gaps; MTTR tracks remediation responsiveness for compliance SLAs.

AI Security Testing: A Complete Framework for Pre-Deployment and Continuous Testing

TL;DR

AI security testing covers four phases: scoping, adversarial attack execution, findings triage, and remediation with regression testing
Pre-deployment testing establishes a baseline; continuous testing catches regressions after fine-tuning, feature changes, and new attack techniques
Integrating adversarial testing into CI/CD pipelines gates deployments on security pass rates and prevents silent safety regressions
ARTEMIS runs automated adversarial testing in CI/CD and maps every finding to MITRE ATLAS and OWASP LLM Top 10

What is AI security testing?#

AI security testing is the discipline of deliberately probing AI systems to find vulnerabilities before attackers do. It covers prompt injection, jailbreaks, guardrail bypass, training data extraction, adversarial input generation, supply chain verification, and agentic tool abuse, across the full AI attack surface rather than just the model interface.

The scope is broader than traditional software security testing. An AI system has a training pipeline, inference API, model artifacts, retrieval corpus, agent tool integrations, and ML-specific software dependencies. Each of these is an attack surface. A security program that tests only the prompt interface misses the majority of exploitable attack vectors in a production AI deployment.

The field draws on AI red teaming methodology, AI penetration testing techniques, and the OWASP LLM Top 10 as a risk classification framework. What distinguishes AI security testing as a discipline is the requirement for statistical sampling, continuous coverage, and CI/CD integration: none of which are standard in traditional application security.

Pre-deployment vs continuous AI security testing#

Pre-deployment testing runs before a model or AI feature goes live. It establishes the baseline security posture: what attack vectors are in scope, which are currently exploitable, and what controls are in place. Pre-deployment testing is typically comprehensive in scope, covering all five attack surfaces, and produces a findings report that gates the release decision.

Continuous testing runs after deployment, on an ongoing schedule or triggered by system changes. Its purpose is different from pre-deployment testing. The goal is not to establish a baseline but to detect drift: new exploitable behaviors introduced by fine-tuning, feature additions, or newly documented attack techniques.

The case for continuous AI red teaming rests on a documented shift in attack timelines. Mean time-to-exploit for known vulnerabilities has collapsed from 771 days in 2019 to hours in current conditions. For AI systems, where a fine-tuning run or a new tool integration can reintroduce a closed vulnerability, a point-in-time security assessment provides coverage for the moment it was conducted, not for the weeks and months after.

Both modes are necessary. Pre-deployment testing provides the documented baseline that compliance frameworks require. Continuous testing provides the operational coverage that the current threat environment demands.

The 4 phases of an AI security testing program#

Phase 1: Scoping and attack surface mapping#

Define what is being tested and what is out of scope. Document every AI asset: the base model, any fine-tuned versions, the training and fine-tuning pipeline, the inference API, the retrieval corpus and vector database for RAG deployments, agent tool integrations, and the ML library dependency graph.

For each asset, enumerate the external interfaces: who can supply input, what they can supply, and what the system can do in response. This produces the attack surface map. Each interface maps to a set of MITRE ATLAS techniques. A RAG ingestion pipeline that accepts third-party documents has document injection (AML.T0019) in scope. An agentic system with outbound HTTP access has task injection and exfiltration via tool calls in scope. An API with no authentication has reconnaissance via ML model inference API access in scope.

AI security testing is expensive relative to traditional testing: generating adversarial inputs at the scale needed for statistical confidence requires compute. A well-scoped engagement focuses that compute on the attack surfaces with the highest exploitability.

Phase 2: Adversarial attack execution#

Execute attacks against each in-scope attack surface. For the input/output layer, this covers prompt injection, jailbreak attempts across the documented technique categories (encoding tricks, multi-turn erosion, persona-based attacks, indirect injection via retrieved content), and adversarial inputs designed to bypass safety classifiers.

For RAG deployments, Phase 2 includes document injection tests (inject adversarial documents and verify retrieval success rates), context window overflow tests (measure how much adversarial content can displace legitimate retrieved material), and data exfiltration tests (embed exfiltration instructions and verify whether the model executes them).

For agentic systems, Phase 2 covers tool-call manipulation: supply adversarially crafted tool responses and measure whether the model follows injected instructions, takes unauthorized actions, or forwards data to unintended destinations.

Statistical sampling is required for output that is meaningful. A single successful jailbreak attempt is not a security finding. An attack success rate across 100 adversarial inputs, under conditions that reflect real attacker effort, is a finding. AI security testing produces probabilistic results, not binary pass/fail outcomes.

Phase 3: Findings triage and prioritization#

Score each finding on exploitability and impact. Exploitability scores should reflect the actual prerequisites: does this require query access only, or does it require write access to the training pipeline or vector store? Does it require a sophisticated adversarial payload or a basic prompt?

Impact scores for AI systems need to account for downstream effects. A guardrail bypass in a standalone chatbot has different impact than the same bypass in an agentic system that can execute code or send emails. A finding that produces incorrect output has different impact than one that exfiltrates system prompts or retrieval context.

Map every finding to the corresponding OWASP LLM Top 10 risk and MITRE ATLAS technique. This produces a findings report that security leadership can read alongside existing risk frameworks, and that compliance teams can reference in audit documentation.

Phase 4: Remediation and regression testing#

Remediation for AI security findings is not always a code change. A successful jailbreak may require a guardrail configuration update, a system prompt hardening step, a retrieval access control change, or an ingestion pipeline sanitization fix. Each remediation approach has a different verification requirement.

After remediation, run a regression test suite. The regression suite covers every finding from the Phase 2 execution, verifying that each specific attack vector no longer succeeds. It also runs a broader adversarial sampling pass to catch cases where the remediation introduced new vulnerabilities or shifted the bypass path to an adjacent technique.

Regression testing is the step most commonly skipped in resource-constrained programs. Without it, there is no documented evidence that the remediation worked, and no detection mechanism if a future change reintroduces the vulnerability.

Integrating AI security testing into CI/CD pipelines#

The goal of CI/CD integration is to make adversarial testing automatic: every deployment triggers a test run, the results determine whether the deployment proceeds, and failures block the release until addressed.

This requires three things. First, a test suite scoped to the deployment context: a full-scope adversarial run for a major model update, and a targeted regression suite for a minor configuration change. Second, defined pass/fail thresholds: what attack success rate is acceptable for each test category, and at what score does a result block deployment. Third, automated reporting that maps failures to the specific ATLAS techniques triggered, so the remediation path is immediate.

The most common implementation gap is scope creep in the CI/CD suite: teams build comprehensive pre-deployment tests and run them on every minor change, which slows the deployment pipeline to the point where teams bypass the gate. The solution is tiered testing: a fast regression suite (under five minutes) runs on every commit, a full adversarial suite runs on releases and model updates, and a targeted deep-dive runs when a new attack technique is added to the scope.

ARTEMIS integrates into CI/CD via API. The test suite configuration, pass thresholds, and reporting format are all configurable per pipeline stage. Findings from every run are accumulated in a continuous coverage history, which gives security teams a time-series view of safety posture across the full deployment history.

What continuous adversarial testing looks like in practice#

Continuous testing is not just repeated pre-deployment testing on a schedule. It requires additional test components that are specific to the operational context.

The first is behavioral drift detection. Model behavior can shift between deployments through fine-tuning, prompt caching side effects, or retrieval corpus updates. Behavioral drift tests run a fixed adversarial sample set against each deployment and compare the results to the baseline. A statistically significant change in attack success rate on any test category triggers a review.

The second is new-technique coverage. MITRE ATLAS adds new techniques as attack research progresses. A continuous testing program must incorporate new techniques into the active test suite as they are documented. A system that passes all tests from six months ago but has never been tested against techniques documented since then has unknown coverage against the current attack surface.

The third is red team-driven test generation. Scheduled automated testing covers the known attack surface. Human red teamers, brought in periodically, find the edges of the automated test suite and generate new test cases. For measuring the results of both automated and human-driven testing, the post on AI red teaming metrics covers the specific metrics that distinguish an active program from a compliance exercise.

Metrics for AI security testing programs#

Four metrics give a complete picture of an AI security testing program's effectiveness.

Attack success rate (ASR) per test category measures what percentage of adversarial attempts in each category succeed. This is the primary coverage metric. It should be measured separately for each ATLAS technique category: ASR for prompt injection, ASR for guardrail bypass, ASR for retrieval manipulation, and so on.

Regression delta measures the change in ASR between consecutive test runs. A positive regression delta (ASR increases between runs) is a signal that a deployment introduced new vulnerabilities or degraded existing controls.

Coverage completeness measures what fraction of the in-scope ATLAS techniques have been tested in the current testing period. A program with 80% ASR coverage but 40% technique coverage has tested the techniques it knows how to test, not the full in-scope surface.

Mean time to remediation (MTTR) measures the elapsed time between a finding being confirmed and the corresponding regression test passing after remediation. This tracks the operational responsiveness of the remediation process and is the metric most directly relevant to compliance frameworks that require findings to be addressed within defined SLAs.

For a complete treatment of these metrics and how to report them to security leadership, see AI Red Teaming Metrics.

Frequently asked questions#

What is AI security testing?

AI security testing is the systematic practice of probing AI systems for exploitable vulnerabilities across the full attack surface: the prompt interface, retrieval pipeline, agentic tool integrations, model training pipeline, and ML software supply chain. It uses adversarial attack techniques drawn from MITRE ATLAS and OWASP LLM Top 10, and produces probabilistic findings rather than binary pass/fail results.

How is AI security testing different from traditional application security testing?

Traditional application security testing finds deterministic vulnerabilities: a SQL injection either works or it does not. AI security testing produces statistical results: an attack technique succeeds on some percentage of attempts under conditions that approximate real attacker effort. This requires different tooling, different metrics, and a different scoping approach. AI systems also have attack surfaces (training pipelines, retrieval corpora, embedding models) that have no equivalent in traditional applications.

How often should AI security testing run?

Pre-deployment testing should run before every major model update, fine-tuning run, or material change to the retrieval corpus or agent tool integrations. Regression testing should run on every deployment. Full continuous adversarial testing should run at minimum monthly, and should incorporate new MITRE ATLAS techniques as they are documented. The frequency should increase with the sensitivity of the data the AI system handles.

What does a CI/CD-integrated AI security test gate look like?

A CI/CD gate for AI security runs a tiered test suite: a fast regression suite on every commit, a full adversarial suite on model releases. The gate passes if the attack success rate on each test category is below the defined threshold and no critical-severity findings are present. Failures block the deployment and route to the security team with findings mapped to ATLAS techniques and remediation guidance.

Does ARTEMIS support CI/CD integration?

Yes. ARTEMIS integrates into CI/CD pipelines via API, with configurable test suites, pass thresholds, and ATLAS-mapped reporting per pipeline stage. It accumulates a continuous coverage history across all runs, giving security teams a time-series view of safety posture. Book a demo to see the CI/CD integration in your deployment environment.

ARTEMIS integrates into CI/CD for automated adversarial testing on every deployment. Book a demo to see it in your pipeline.