What are the 5 AI attack surfaces in a threat model?

The five AI attack surfaces are: the input/output layer (prompt injection, jailbreaks, output manipulation), the retrieval layer for RAG systems (document injection, vector poisoning, retrieval manipulation), the agentic layer (tool call abuse, task injection, excessive agency), the model layer (training data poisoning, backdoor implantation, model theft), and the runtime/infrastructure layer (ML supply chain compromise, API security, inference infrastructure).

AI Threat Modeling: How to Map the Attack Surface of Your LLM Applications

Q: How often should an AI threat model be updated?

The threat model should be updated whenever the AI system changes: new model version, new fine-tuning, new tool integrations, or new data sources. Beyond system changes, it should be reviewed at least quarterly against new MITRE ATLAS techniques, and after any red team exercise that surfaces threats not already in the model.

Q: Does ARTEMIS generate a threat model?

ARTEMIS profiles the attack surface of an AI deployment and maps every finding to MITRE ATLAS techniques, producing a finding-based threat map. This complements manual threat modeling exercises and validates their risk assumptions with adversarial test results.

TL;DR

Traditional threat modeling frameworks (STRIDE, PASTA) do not cover AI-specific attack surfaces like training pipelines, retrieval layers, and agentic tool integrations
AI threat modeling maps five distinct surfaces: input/output layer, retrieval layer, agentic layer, model layer, and runtime/infrastructure layer
Threat models without red team validation are documentation exercises; findings need adversarial testing to confirm exploitability
ARTEMIS profiles your AI system's attack surface automatically, outputs a prioritized threat map, and maps every finding to MITRE ATLAS

What is AI threat modeling?#

Threat modeling is the practice of systematically identifying what can go wrong in a system, who would exploit it, and what the consequences would be. For AI systems, the same questions apply but the attack surfaces are different: a language model has a training pipeline, an inference API, a context window, and (increasingly) tool integrations and retrieval systems that have no equivalent in traditional software.

AI threat modeling adapts established frameworks to cover these surfaces. The output is a structured document that identifies the AI assets under consideration, the threats relevant to each, the likelihood and impact of exploitation, and the controls or tests needed to address each threat. That document feeds compliance requirements under frameworks like NIST AI RMF and ISO 42001, and scopes the adversarial testing program.

Without a threat model, AI red teaming exercises lack scope definition, and attack surface management efforts lack a structured inventory to manage.

Why traditional threat modeling frameworks need adaptation for AI#

STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) was designed for software systems with defined input/output boundaries. It maps reasonably well to AI systems when applied at the right layer, but its categories require reinterpretation.

Tampering in a traditional system means modifying data in transit or at rest. In an AI system, Tampering also covers training data poisoning (AML.T0020), model file manipulation during serving, and adversarial perturbations of inference inputs. Elevation of Privilege, which classically means gaining unauthorized OS or network access, maps directly to prompt injection bypassing safety controls and an agent gaining permissions beyond its intended scope.

PASTA (Process for Attack Simulation and Threat Analysis) is more adaptable because it is objectives-driven: it starts with business objectives and works backward to technical threats. For AI systems, this means starting with what the model is authorized to do and what the consequences would be if an attacker redirected those capabilities. PASTA handles agentic AI reasonably well because it frames threats in terms of attacker goals rather than component vulnerabilities.

Neither framework covers the ML-specific techniques catalogued in MITRE ATLAS without adaptation. A STRIDE analysis that does not include Publish Poisoned Datasets (AML.T0019), Backdoor ML Model (AML.T0018), or Exfiltration via ML Inference API (AML.T0024) is incomplete for any AI system with a training pipeline or inference API.

The 5 AI attack surfaces every threat model must cover#

Input/output layer#

The input/output layer is where users and external systems interact with the model. Threats here include prompt injection, jailbreaks, encoding and Unicode manipulation to bypass safety classifiers, and output manipulation to cause the model to generate harmful or misleading content.

This is the most visible attack surface and the most commonly tested one. It is also where most existing AI security tooling focuses. A threat model that covers only this layer gives a false impression of completeness.

Retrieval layer (RAG)#

For systems using Retrieval-Augmented Generation, the retrieval layer adds a document store, an embedding model, and a context assembly step. Threats here include document injection into the knowledge base, retrieval manipulation via adversarially crafted content scored to surface over legitimate results, vector poisoning, and indirect prompt injection via retrieved documents.

The retrieval layer is typically outside the scope of input/output security testing. It requires a separate threat model entry with its own set of controls and test cases.

Agentic layer (tool integrations)#

Agents that call external tools, write to databases, send messages, or browse the web have an attack surface that extends beyond the model itself. Threats at the agentic layer include task injection via tool responses, excessive agency (the model taking actions beyond its intended authorization), and tool-call exfiltration (an adversarially injected instruction causes the agent to forward data to an external endpoint).

The blast radius of an agentic AI compromise scales with the permissions granted to the agent. Threat modeling at the agentic layer must document every tool integration and the maximum damage an attacker could achieve if they redirected each one.

Model layer#

The model layer covers the model weights, the fine-tuning pipeline, and the model serving infrastructure. Threats here include training data poisoning, backdoor implantation via fine-tuning on a poisoned dataset, model theft through systematic inference API querying, and supply chain compromise of the serving infrastructure.

Most organizations treat the model layer as a black box for threat modeling purposes, which understates the risk. A fine-tuned model is as much a software artifact as a compiled binary: it has provenance, it can be audited, and its training data can be tested for poison.

Runtime/infrastructure layer#

The runtime layer covers the compute infrastructure, the model serving stack, the API gateway, and any ML-specific libraries in the dependency graph. Threats here are primarily supply chain attacks (compromising a library like litellm, numpy, or a vector database client) and infrastructure attacks that affect model availability or integrity without touching the model directly.

For a full treatment of AI risk management across all five surfaces, see the linked post.

Step-by-step AI threat modeling process#

Step 1: Define system scope and AI assets#

Document every AI asset in scope: the base model and any fine-tuned versions, the training and fine-tuning pipeline, the inference API, the retrieval corpus and vector database, agent tool integrations, and the ML libraries in the serving stack. Include version numbers and hosting details.

This step often reveals assets that security teams were not aware of, particularly in organizations that have deployed AI incrementally. Shadow AI is a common finding: teams deploy models via third-party SaaS integrations that never went through formal procurement or security review.

Step 2: Enumerate attack surfaces#

For each asset, document the external interfaces: who can query it, what they can supply as input, and what they receive as output. For the retrieval layer, document every document ingestion path. For agent tool integrations, document every external system the agent can reach and what actions it can take.

The goal at this step is a complete map of where external-controlled data enters the AI system. Each interface is a potential attack surface. The map becomes the scope definition for subsequent threat analysis and red team testing.

Step 3: Map threats to MITRE ATLAS tactics#

For each attack surface identified in Step 2, enumerate the ATLAS techniques that apply. The MITRE ATLAS tactic categories provide the vocabulary: Reconnaissance, Resource Development, Initial Access, ML Attack Staging, Exfiltration, and Impact.

A training pipeline with no access controls on data ingestion has ATLAS Initial Access via Publish Poisoned Datasets (AML.T0019) in scope. An inference API accessible without authentication has Reconnaissance via ML Model Inference API access in scope. This step produces a threat table that links each attack surface to the specific ATLAS techniques an attacker would use against it.

Step 4: Assign risk scores#

Score each threat on likelihood and impact using a consistent framework. For AI-specific threats, likelihood scores should account for the technical prerequisites (does exploiting this technique require query access, training access, or supply chain access?), and impact scores should include downstream effects in agentic systems (does exploiting this technique lead to data exfiltration, privilege escalation, or business process disruption?).

Document the assumptions behind each score. Likelihood estimates for training data poisoning depend on who has write access to the training pipeline. If that access control assumption is wrong, the score changes. Recording the assumption makes the threat model falsifiable when red team results come in.

Step 5: Validate with red teaming#

A threat model without adversarial validation is a hypothesis document. Red teaming tests whether the threats identified in the threat model are actually exploitable given the current state of controls.

The threat model scopes the red team exercise: which surfaces to test, which ATLAS techniques to attempt, and which risk assumptions to challenge. Red team findings that confirm a threat validate the threat model entry and trigger remediation. Red team findings that surface threats not in the threat model are inputs for updating the threat model.

This cycle, threat model to red team to updated threat model, is the process that keeps AI security posture current as the system and the threat surface evolve.

AI threat modeling for compliance documentation#

NIST AI RMF requires organizations to Map, Measure, Manage, and Govern AI risk. Threat modeling produces the inputs for the Map and Measure functions: the asset inventory, the threat enumeration, and the risk scores feed directly into the risk measurement artifacts NIST AI RMF requires.

ISO 42001, the international standard for AI management systems published in 2023, requires documented risk assessment processes covering AI-specific risks. A threat model structured around the five attack surfaces above, with ATLAS technique mapping and documented risk scores, satisfies the ISO 42001 risk assessment requirement and provides the evidence artifact for certification audits.

The compliance benefit of a structured AI threat model is most visible during audits: an auditor asking "how does your organization identify and assess AI-specific risks" receives a direct answer in the form of the threat model document, the red team findings that validated it, and the remediation records tied to specific ATLAS technique findings.

AI threat modeling tools#

Three categories of tooling support the AI threat modeling process.

Automated attack surface profiling tools scan AI system configurations, enumerate exposed interfaces, and map findings to threat categories. ARTEMIS does this as part of its pre-engagement scoping module, producing an attack surface map that directly inputs into Steps 2 and 3 of the process above.

Structured threat modeling tools like OWASP Threat Dragon and Microsoft Threat Modeling Tool provide diagramming and documentation workflows. Neither has native ATLAS support, but both accept custom threat libraries, which allows ATLAS technique entries to be imported and applied to AI system diagrams.

Repello AI's open-source Agent Wiz tool performs MAESTRO threat modeling for agentic systems. It parses Python orchestrator files using AST analysis to generate agent-to-tool-to-LLM runtime graphs and annotates them with threat paths across 12 agentic failure modes. It outputs risk summaries mapped to ATLAS techniques and produces an automated threat model for the agentic layer.

Frequently asked questions#

What is AI threat modeling?

AI threat modeling is the process of systematically identifying the attack surfaces, threats, and risks specific to AI and ML systems. It adapts established frameworks like STRIDE and PASTA to cover AI-specific attack vectors including training data poisoning, prompt injection, retrieval layer attacks, agentic tool abuse, and ML supply chain compromise. The output is a structured document that scopes security testing and satisfies compliance requirements.

How is AI threat modeling different from traditional software threat modeling?

Traditional threat modeling focuses on data flows, trust boundaries, and component interactions in software systems. AI threat modeling must additionally cover the model training pipeline, the inference API as an attack surface, the retrieval corpus in RAG systems, agent tool integrations, and ML-specific software dependencies. MITRE ATLAS provides the technique taxonomy for AI-specific threats that STRIDE and PASTA do not natively cover.

What frameworks should I use for AI threat modeling?

STRIDE and PASTA both apply with adaptation, and MITRE ATLAS should be layered on top to cover ML-specific techniques. NIST AI RMF provides the governance structure for how threat modeling outputs feed into organizational AI risk management. ISO 42001 specifies the documentation requirements for AI risk assessment. Using STRIDE or PASTA for threat enumeration and ATLAS for technique mapping produces a complete threat model artifact.

How often should an AI threat model be updated?

Whenever the AI system changes (new model, new fine-tuning, new tool integrations, new data sources), the threat model should be updated to reflect the changed attack surface. Beyond system changes, the threat model should be reviewed at least quarterly against new ATLAS techniques added since the last review, and after any red team exercise that surfaces threats not in the current model.

Does ARTEMIS generate a threat model?

ARTEMIS profiles the attack surface of your AI deployment and maps every finding to MITRE ATLAS techniques. This produces a finding-based threat map that complements a manual threat modeling exercise and validates its risk assumptions. Book a demo to see how ARTEMIS scopes and validates threat models for your AI systems.

ARTEMIS profiles your AI system's attack surface automatically. Book a demo to see the threat map for your deployment.