What does the OWASP LLM Top 10 cover?

The OWASP Top 10 for Large Language Model Applications enumerates the ten most critical security risks in LLM-based applications: prompt injection (LLM01), sensitive information disclosure (LLM02), supply chain vulnerabilities (LLM03), data and model poisoning (LLM04), improper output handling (LLM05), excessive agency (LLM06), system prompt leakage (LLM07), vector and embedding weaknesses (LLM08), misinformation (LLM09), and unbounded consumption (LLM10). It is the primary industry reference for AI application security risk catalogs and is updated to reflect emerging attack techniques.

AI Security Glossary: 35 Key Terms Every Security Team Needs to Know

TL;DR

This glossary defines 35 essential terms across AI security: attack techniques, defensive capabilities, governance frameworks, and agentic AI-specific concepts.
Each definition is written for precision: security engineers, AI/ML engineers, and technical security leads who need operational language rather than marketing copy.
Terms are listed alphabetically. Where a term has a deeper treatment on the Repello blog, a link is included.

AI security has its own vocabulary, and it matters. "Prompt injection" and "jailbreaking" are not synonyms. "Model scanning" and "red teaming" address different layers of the stack. "RAG poisoning" and "training data poisoning" operate at different stages of the AI pipeline. Imprecise language produces imprecise controls.

This glossary covers the core terms every security practitioner working with AI systems needs to know: what each concept means, how it works, and how it connects to real attack and defense scenarios. Terms are defined in plain English with enough technical specificity to be operationally useful.

A#

Adversarial input#

An adversarial input is a deliberately crafted prompt, query, or data sample designed to cause an AI model to produce an incorrect, harmful, or unintended output. Adversarial inputs exploit the probabilistic nature of neural networks; small, often imperceptible perturbations to input data can produce large, predictable changes in model output. In LLM contexts, adversarial inputs range from carefully phrased natural language prompts to Unicode-encoded obfuscation and embedding-space manipulation.

Agentic AI#

Agentic AI refers to AI systems that autonomously plan and execute multi-step tasks using tools, APIs, and external services, rather than simply generating text in response to a single prompt. An agentic system can browse the web, write and execute code, send emails, query databases, and invoke other agents, often operating over extended time horizons without human intervention at each step. The autonomy that makes agentic AI productive also makes it a higher-risk attack surface; a single successful injection can propagate across every tool in the agent's permission set. See Security Threats in Agentic AI Browsers for documented real-world attack chains, and The Agentic AI Security Threat Landscape in 2026 for the current attacker methodology targeting these systems.

AI attack surface#

The AI attack surface is the full set of components, interfaces, and data flows in an AI system that an attacker can attempt to exploit. It includes the model interaction layer (system prompt, user input, conversation history), retrieval pipelines and knowledge bases, tool integrations and API connections, MCP server connections, memory stores, output consumers, and the training pipeline. Unlike traditional application attack surfaces, the AI attack surface includes natural language interfaces where every source of content the model reads is a potential injection vector. For a detailed breakdown of how to map and reduce this surface, see AI Attack Surface Management.

AI Bill of Materials (AI-BOM)#

An AI Bill of Materials is a structured, continuously updated inventory of every component in an AI system; the trained models and versions in use, datasets used in training and fine-tuning, inference API connections, agent orchestration chains and their dependency relationships, MCP server connections, and external tool integrations. It applies the SBOM concept from Executive Order 14028 to AI systems, where the dependency chain is more complex and regulatory requirements (EU AI Act Article 11, NIST AI RMF) require documented traceability. Without an AI-BOM, organizations cannot accurately assess supply chain risk in their AI stack or respond to an AI security incident with confidence. See AI Bill of Materials: What It Is and How to Build One for a practical implementation guide.

AI jailbreak#

An AI jailbreak is an attack technique that attempts to override or bypass an LLM's safety training, content policies, or operator instructions to cause the model to produce outputs it was designed to refuse. Jailbreaks exploit the tension between a model's instruction-following capability and its safety constraints; by framing requests as fiction, hypotheticals, roleplay, or encoded instructions, attackers can sometimes cause the model to comply with prohibited requests. Unlike prompt injection (which inserts instructions the model was not designed to receive), jailbreaking attempts to override instructions the model was explicitly given. See Understanding AI Jailbreaking Techniques for a taxonomy of current techniques.

AI red teaming#

AI red teaming is the practice of systematically testing AI systems for security vulnerabilities by simulating realistic attack scenarios against the system's own interfaces. An AI red team attempts prompt injection, jailbreaking, RAG exfiltration, tool abuse, and system prompt extraction to identify exploitable weaknesses before adversaries do. AI red teaming differs from traditional software penetration testing in that many vulnerabilities are probabilistic and behavioral; testing across a range of inputs and states is required rather than binary pass/fail checks against defined endpoints. See The Essential Guide to AI Red Teaming for a full methodology breakdown, and LLM Red Teaming Platforms for a comparison of tooling options.

AI security posture management (AI-SPM)#

AI security posture management is the continuous practice of discovering, assessing, testing, and remediating security risks across an organization's AI systems, models, pipelines, and integrations. It adapts the posture management discipline to the AI-specific threat catalog defined by the OWASP LLM Top 10, MITRE ATLAS, and NIST AI 600-1. A mature AI-SPM program requires five capabilities operating in sequence: complete AI asset discovery, AI-specific risk classification, continuous adversarial testing, runtime behavioral monitoring, and governance reporting. See AI Security Posture Management: A Practical Guide for an enterprise implementation framework, and the VANTAGE framework for a structured approach to AI asset inventorisation underpinning AI-SPM.

B#

Backdoor attack#

A backdoor attack on an AI model is a supply chain attack in which an adversary poisons the training data or modifies model weights to embed a hidden capability that activates only when a specific trigger is present in the input. The model behaves normally on standard inputs but produces attacker-controlled outputs when it encounters the trigger pattern. Backdoor attacks are particularly dangerous because the compromised behavior is invisible to standard performance benchmarks; they can persist through subsequent fine-tuning cycles if the trigger is not included in fine-tuning data. Repello's analysis of safety in models derived from DeepSeek R1 documents how backdoor properties and safety gaps carry forward through model distillation.

Blast radius#

In AI security, blast radius refers to the scope of what a successful attack against an AI component can affect. It is determined by four factors: the data the model can access, the actions it can execute via tool integrations, the downstream systems and processes that consume its outputs, and the persistence mechanisms (memory stores, cached outputs) that can carry compromised state across sessions. In agentic deployments, blast radius is typically bounded by the agent's permission set rather than the entry point exploited; least-privilege access control is the most direct blast radius reduction lever.

C#

Context window poisoning#

Context window poisoning is an attack in which adversarial instructions are injected into the content that populates an LLM's context window; retrieved documents, tool call outputs, conversation history, or injected file contents. Because LLMs process all content in the context window as potentially instructional, adversarially crafted content that appears in context can redirect model behavior even when not submitted directly by the user. Context window poisoning is the underlying mechanism of indirect prompt injection and RAG poisoning attacks. See Prompt Injection: A Comprehensive Technical Guide for documented exploitation patterns across all context window injection vectors.

D#

Data poisoning#

Data poisoning is an attack on the training or retrieval data used by an AI system, in which adversarially crafted samples are introduced to alter model behavior. In training-time poisoning, corrupted data shifts the model's learned associations, introduces backdoors, or degrades performance on targeted inputs. In retrieval-time poisoning (RAG poisoning), malicious documents are introduced into the knowledge base the model retrieves from; instructions embedded in these documents execute when the document is retrieved. Data poisoning differs from model theft in that it modifies what the model knows or does, rather than what the attacker learns about it. See Data Security and Privacy for AI Systems for controls that address both training-time and retrieval-time poisoning vectors.

Denial of wallet (DoW)#

Denial of wallet is a resource exhaustion attack against AI systems that charge per token or per API call. By sending requests designed to maximize token consumption (extremely long inputs, prompts that induce verbose outputs, recursive loops), an attacker can drive up an organization's API costs without triggering traditional availability-based DoS defenses. Unlike denial of service, denial of wallet does not necessarily degrade availability for legitimate users; it generates financial harm while the service continues functioning. The attack is especially effective against agentic systems where a single malicious prompt can trigger many downstream API calls. See Denial of Wallet for a technical breakdown.

Direct prompt injection#

Direct prompt injection is an attack in which the user directly submits adversarial instructions to an LLM as part of their input, attempting to override the system prompt, operator configuration, or safety constraints. The attack exploits the model's tendency to follow instructions regardless of their source; a user who instructs the model to "ignore all previous instructions" may cause compliance if the injection is well-crafted. Direct prompt injection is one of the two primary prompt injection categories, the other being indirect prompt injection. OWASP LLM01:2025 documents this as the top LLM application risk.

E#

Embedding inversion#

Embedding inversion is an attack technique that attempts to reconstruct original text from a text embedding vector. Because embedding models produce dense numerical representations that preserve semantic meaning, those representations contain recoverable information about the original input. Research has demonstrated that embedding vectors can be partially or fully inverted to recover source text; this has significant privacy implications for any system that stores or transmits embeddings of sensitive content without encrypting the underlying vectors. As RAG architectures proliferate, embedding stores have become a dedicated attack surface requiring access controls equivalent to the data they represent.

Excessive agency#

Excessive agency is a vulnerability class in agentic AI deployments where an AI agent has been granted more permissions, tool access, or autonomy than its intended function requires. OWASP LLM06:2025 identifies excessive agency as a top LLM risk because over-permissioned agents have a larger blast radius when compromised; a prompt injection that redirects a read-only agent can exfiltrate data, while the same injection against an agent with write and delete permissions can destroy or corrupt data. Least-privilege principles apply directly to agentic AI systems, scoping tool access to the minimum required for each agent's intended function. The OWASP Agentic AI Top 10 provides an enterprise security roadmap for addressing excessive agency alongside the nine other critical agentic AI risks.

F#

Foundation model#

A foundation model is a large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks through fine-tuning, prompting, or instruction following. Foundation models (including large language models such as GPT-4 and Claude, multimodal models, and code models) introduce a supply chain dependency for every application built on top of them: behavioral properties of the underlying model (including safety training gaps, training data biases, and undisclosed capabilities) propagate downstream to every deployment. Organizations using foundation models via API should document the model version and provider's data handling terms as part of their AI Bill of Materials.

G#

Guardrails#

In AI security, guardrails are input and output filtering mechanisms applied to an AI system to prevent the processing or generation of content that violates safety policies, compliance requirements, or operator-defined restrictions. Guardrails can operate at multiple layers: as system-prompt instructions (soft guardrails that the model may not follow under adversarial pressure), as classifiers that evaluate inputs before passing them to the model, or as classifiers that evaluate model outputs before delivery to the user. Guardrails are necessary but not sufficient; adversarial inputs designed to evade detection can bypass classifier-based guardrails through obfuscation, encoding, or semantic equivalence. See Breaking Meta's Prompt Guard for a documented example of guardrail bypass.

H#

Homogeneity risk#

Homogeneity risk is a systemic vulnerability that arises when a large portion of an industry or critical infrastructure relies on the same foundation model or AI system. If the shared underlying model has a security vulnerability, behavioral bias, or training data flaw, all organizations using that model are simultaneously affected by the same weakness. NIST AI 600-1 identifies homogeneity risk as a distinct category of generative AI risk, separate from vulnerabilities affecting only individual deployments; it recommends diversity in AI system sourcing for high-stakes applications.

I#

Indirect prompt injection#

Indirect prompt injection is an attack in which adversarial instructions are embedded in external content that an LLM is expected to process; a web page the agent browses, a document retrieved from a knowledge base, an email being summarized, or a tool call response. The instructions arrive through the model's context via a trusted-seeming channel rather than directly from the user. When the model processes the content, it may interpret the embedded instructions and execute them, effectively being hijacked by a third-party attacker who has no direct access to the model interface. Repello's research on indirect prompt injection in voice AI documents real-world exploitation of this class. For a full taxonomy of injection vectors including indirect injection, see Prompt Injection: A Comprehensive Technical Guide.

L#

LLM pentesting#

LLM pentesting (large language model penetration testing) is the structured security assessment of an LLM-based application using an adversarial methodology. A comprehensive LLM pentest covers prompt injection susceptibility, system prompt leakage, jailbreaking resilience, RAG exfiltration paths, tool abuse vectors, data exfiltration through model outputs, and input validation controls. Unlike traditional pentesting, LLM pentesting requires creative prompt construction and behavioral observation across probabilistic outputs rather than automated exploit execution against deterministic endpoints. See How to Pentest an LLM for a full methodology, and LLM Pentesting: Checklist and Tools for a practitioner checklist. For agentic systems specifically, Pentesting Agentic AI covers the additional attack surface introduced by tool integrations and multi-agent chains.

M#

Many-shot jailbreaking#

Many-shot jailbreaking is a jailbreak technique that exploits the extended context windows of modern LLMs by prepopulating the conversation history with a large number of fabricated exchanges that normalize the target behavior. By providing dozens or hundreds of fictional prior turns in which the model appears to have complied with harmful requests, an attacker shifts the model's behavioral baseline for the current session. Research documented in Anthropic's many-shot jailbreaking findings demonstrated that effectiveness scales with context window length; models with larger context windows are potentially more susceptible to this technique.

MCP (Model Context Protocol) security#

Model Context Protocol (MCP) is an open protocol that standardizes how AI agents connect to external tools, data sources, and services. An MCP server exposes capabilities (file access, database queries, API calls, code execution) to connected agents via a defined interface. From a security perspective, MCP connections are a high-risk attack surface; a malicious or compromised MCP server can redefine the tools available to a connected agent, injecting capabilities that redirect agent execution to attacker-controlled infrastructure. Repello's research on MCP tool poisoning to RCE documents the full exploitation chain from a compromised tool definition to remote code execution through normal agent operation. For real-world MCP exploitation across production AI systems, see Zero-Click Calendar Exfiltration via MCP.

Membership inference attack#

A membership inference attack attempts to determine whether a specific data sample was included in the training dataset of an AI model. By querying the model with samples and analyzing its outputs (confidence scores, perplexity, loss values), an attacker can infer with above-chance accuracy whether specific individuals, records, or documents were present in the training data. Successful membership inference attacks against models trained on sensitive data (medical records, financial data, private communications) constitute a privacy violation; they reveal information about training data without direct access to the dataset itself. See Data Security and Privacy for AI Systems for controls addressing membership inference and related data privacy risks.

MITRE ATLAS#

MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is a knowledge base of adversarial tactics, techniques, and case studies targeting AI and machine learning systems, maintained by MITRE in the same framework structure as MITRE ATT&CK. ATLAS documents techniques including model inversion, data poisoning, backdoor insertion, prompt injection, and adversarial evasion; these are mapped to a tactic taxonomy covering reconnaissance, resource development, initial access, persistence, defense evasion, discovery, collection, exfiltration, and impact. Security teams use MITRE ATLAS to structure AI threat modeling exercises and identify coverage gaps in their AI security controls.

Model extraction#

Model extraction (also called model theft) is an attack that reconstructs a proprietary AI model by querying it through its API and using the inputs and outputs to train a surrogate model that approximates the original. A successful extraction attack allows an attacker to replicate a model's functionality without authorization; it defeats access controls and potentially exposes training data and intellectual property embedded in the model's weights. Model extraction attacks are typically detected through query volume anomalies and unusual input distribution patterns in API traffic logs.

Model scanning#

Model scanning is the practice of inspecting AI model files (weights, serialized model artifacts) for embedded malware, backdoors, serialization exploits, and supply chain tampering before deployment. Malicious code can be embedded in model files through compromised hosting platforms, tampered downloads, or supply chain attacks against model repositories such as Hugging Face. Model scanning tools analyze the serialization format of model files (PyTorch pickle files, Safetensors, ONNX) for dangerous deserialization patterns and verify cryptographic integrity against known-good hashes. See Securing Machine Learning Models for a scanning methodology guide, and The Complete Guide to ML Model Security in 2026 for coverage across all three security phases: pre-deployment, continuous testing, and runtime monitoring.

Multi-turn attack#

A multi-turn attack is a prompt injection or jailbreaking technique that spreads the adversarial payload across multiple conversational turns rather than embedding it in a single message. By establishing context, building rapport, or gradually shifting the conversation over several exchanges, an attacker can maneuver a model into a behavioral state where it complies with a request it would have refused in isolation. Multi-turn attacks are harder to detect than single-turn attacks because no individual message contains a clearly adversarial payload; they exploit conversational context that accumulates across the session.

N#

NIST AI RMF#

The NIST AI Risk Management Framework is a voluntary framework published by the National Institute of Standards and Technology in January 2023; NIST AI 600-1 extended it in July 2024 for generative AI. The framework organizes AI risk management around four core functions: Govern (establishing accountability and policies), Map (identifying risks in context), Measure (analyzing and assessing risk), and Manage (prioritizing and remediating risk). NIST AI 600-1 extends these functions to generative AI-specific risks including prompt injection, data poisoning, model provenance, homogeneity risk, and training transparency. The NIST AI RMF has become a primary compliance reference for US organizations and underpins requirements in sectors including financial services and healthcare.

O#

OWASP LLM Top 10#

The OWASP Top 10 for Large Language Model Applications is a security awareness document published by the Open Worldwide Application Security Project that enumerates the ten most critical security risks in LLM-based applications. The 2025 version covers: LLM01 (prompt injection), LLM02 (sensitive information disclosure), LLM03 (supply chain vulnerabilities), LLM04 (data and model poisoning), LLM05 (improper output handling), LLM06 (excessive agency), LLM07 (system prompt leakage), LLM08 (vector and embedding weaknesses), LLM09 (misinformation), and LLM10 (unbounded consumption). The OWASP LLM Top 10 is the de facto standard reference for AI application security risk catalogs. For a CISO-oriented breakdown, see Repello's OWASP LLM Top 10 explainer.

P#

Prompt injection#

Prompt injection is a class of attack against LLM-based applications in which adversarial instructions are embedded in content the model processes, causing the model to execute unintended actions or produce unintended outputs. Prompt injection is the AI equivalent of SQL injection; both attacks exploit insufficient separation between instructions and data. The OWASP LLM Top 10 has ranked prompt injection as the number-one LLM application risk for two consecutive years. Attack variants include direct injection (from the user interface), indirect injection (from external content the model retrieves), and multi-modal injection (from images, audio, or file attachments). For a comprehensive technical breakdown, see Prompt Injection: A Comprehensive Technical Guide. For a catalog of documented real-world examples, see 10 Prompt Injection Attack Examples.

Prompt leaking#

Prompt leaking (also called system prompt extraction) is an attack in which an adversary causes an LLM to reveal the contents of its system prompt through carefully crafted queries. System prompts often contain confidential business logic, API credentials, proprietary instructions, persona definitions, and operational constraints that the operator intended to keep hidden from end users. Successful prompt leaking exposes this intellectual property and may reveal exploitable constraints: a system prompt that says "never discuss competitor X" reveals competitive intelligence; one that defines an authorization threshold reveals an exploitable boundary. Prompt leaking is classified as LLM07:2025 in the OWASP LLM Top 10. See 10 Prompt Injection Attack Examples for documented prompt leaking patterns alongside other injection variants.

R#

RAG poisoning#

RAG poisoning (retrieval-augmented generation poisoning) is an attack in which adversarially crafted documents are introduced into the knowledge base or document store that an LLM retrieves from during generation. When the model retrieves a poisoned document as context, it may interpret embedded instructions and execute them; being hijacked through its retrieval pipeline rather than its user interface is the distinctive pattern. RAG poisoning is a form of indirect prompt injection that is particularly difficult to detect because the malicious content appears in a trusted internal data source rather than in a user-submitted message. Repello's research on RAG poisoning demonstrated persistent behavioral manipulation of a production LLM through a single poisoned document.

Runtime security#

Runtime security for AI systems refers to the continuous monitoring and enforcement of security controls at the inference layer in production; evaluating model inputs before processing, evaluating model outputs before delivery, detecting anomalous tool call patterns, and blocking behavioral deviations from the intended operating envelope. Runtime security catches attacks that evasion techniques have bypassed during pre-deployment testing and detects novel attack patterns that were not present in the test suite at deployment time. Effective AI runtime security must operate within inference latency constraints (under 100 milliseconds) to avoid degrading user experience. See Introducing ARGUS for a technical overview of how runtime security is implemented at the inference layer for production AI deployments.

S#

Shadow AI#

Shadow AI refers to AI tools, models, APIs, and integrations used within an enterprise without the knowledge or approval of the security or IT organization. It includes consumer LLMs used for work tasks, unauthorized AI browser extensions, Slack and Teams bots added without IT review, AI coding assistants installed by developers, MCP server connections running on developer machines, and direct API integrations calling external model providers from application code. Shadow AI is the category where the enterprise AI attack surface grows fastest and security visibility is lowest; a 2024 report from Cyberhaven found that 4.2% of workers had pasted company data into ChatGPT. See Shadow AI: What It Is and How to Find It for a detection and governance framework.

System prompt#

A system prompt (also called a system message or operator instructions) is a configuration input provided to an LLM by the operator or developer that establishes the model's behavior, persona, scope restrictions, and task context before any user interaction occurs. System prompts define what the model should and should not do, what role it plays, what tools it has access to, and what constraints it must enforce. System prompts are the primary control mechanism for LLM-based applications and the primary target of prompt injection attacks attempting to override operator-defined behavior. Repello AI Research Team analysis shows that system prompt extraction is the most common exploitation pattern in production AI incidents.

T#

Training data poisoning#

Training data poisoning is an attack in which an adversary introduces corrupted, manipulated, or adversarially crafted samples into the dataset used to train an AI model, with the goal of altering the model's learned behavior in ways that serve the attacker's objectives. Training data poisoning can introduce backdoors (behavior that activates only on specific trigger inputs), shift model outputs in targeted directions across a range of inputs, or degrade performance on specific input types. The attack is most feasible when an organization trains or fine-tunes models on data collected from external or user-generated sources; these are most difficult to validate exhaustively before use. See Data Security and Privacy for AI Systems for controls addressing training data integrity and validation workflows.

V#

Vector embedding attack#

A vector embedding attack is any attack that targets the dense numerical representations generated by AI models to encode the semantic meaning of text, images, or other data. Attack classes include embedding inversion (reconstructing original text from embedding vectors), embedding space poisoning (inserting crafted documents positioned to be retrieved in response to targeted queries), and embedding leakage (extracting embedding representations that reveal information about training data or other users' queries). As RAG architectures proliferate and embedding stores become a primary AI data layer, the embedding surface requires dedicated access controls and monitoring equivalent to the underlying data it represents. See Data Security and Privacy for AI Systems for embedding store access controls and RAG pipeline security practices.

How Repello covers this attack surface#

The terms in this glossary describe the threat landscape that AI security programs must address. Repello's platform covers three critical layers of that landscape.

ARTEMIS is Repello's automated red teaming engine: it runs continuous adversarial testing across the attack surface categories above (prompt injection, jailbreaking, RAG exfiltration, tool abuse, system prompt leakage, and agentic attack chains) to identify exploitable vulnerabilities before adversaries do.

ARGUS is Repello's runtime security layer: it enforces behavioral controls at the inference layer in production, monitoring inputs and outputs against the intended operating envelope and blocking attacks that evasion techniques have bypassed during pre-deployment testing.

Repello's AI Asset Inventory addresses the shadow AI and AI-BOM problem: continuous discovery of every model, API, and AI integration in the enterprise environment, including those introduced outside formal procurement, providing the inventory foundation that every other security function requires.

Frequently asked questions#

What is the difference between prompt injection and jailbreaking?

Prompt injection inserts instructions into content the model processes, exploiting insufficient separation between data and instructions. Jailbreaking attempts to override instructions the model was explicitly given (its safety training and operator constraints) through persuasion, framing, or indirect techniques. A prompt injection attack does not require the model to violate its safety training; it requires the model to follow instructions from an unauthorized source. A jailbreak requires the model to violate its own constraints. Both exploit instruction-following behavior, but at different layers of the model's control hierarchy.

What is AI red teaming and how does it differ from traditional pentesting?

AI red teaming is the structured adversarial testing of AI systems to identify security vulnerabilities through simulated attacks against the system's own interfaces. It differs from traditional penetration testing in that AI vulnerabilities are behavioral and probabilistic rather than deterministic: the same input may succeed on one run and fail on another. AI red teams must test across a range of inputs, states, and conversation histories, and must develop adversarial prompts manually rather than running automated exploit frameworks. AI red teaming also covers attack classes (prompt injection, RAG poisoning, system prompt extraction) that have no equivalent in traditional application security testing.

What does OWASP LLM Top 10 cover?

The OWASP Top 10 for Large Language Model Applications enumerates the ten most critical security risks in LLM-based applications: prompt injection, sensitive information disclosure, supply chain vulnerabilities, data and model poisoning, improper output handling, excessive agency, system prompt leakage, vector and embedding weaknesses, misinformation, and unbounded consumption. It is the primary industry reference for AI application security risk catalogs and is updated to reflect emerging attack techniques.

What is model scanning and why is it important?

Model scanning is the inspection of AI model files for embedded malware, backdoors, serialization exploits, and supply chain tampering before deployment. AI models are distributed as serialized files (PyTorch pickle format, Safetensors, ONNX) that can contain executable code embedded by malicious actors during the supply chain. A model downloaded from a public repository and deployed without scanning may execute attacker-controlled code at load time or contain behavioral backdoors that activate only under specific conditions. Model scanning is a supply chain security control analogous to antivirus scanning for software executables.

What is the difference between direct and indirect prompt injection?

Direct prompt injection occurs when the user submits adversarial instructions directly through the model's user interface. Indirect prompt injection occurs when adversarial instructions are embedded in external content the model processes (retrieved documents, web pages, emails, tool outputs) and arrive through a trusted-seeming channel rather than from the user. Indirect injection is generally harder to detect and defend against because the malicious content appears in data rather than in user input, and may originate from sources the operator has no control over.

What is shadow AI and why is it a security risk?

Shadow AI is AI tools, models, and integrations operating in an enterprise without security or IT team awareness. The security risks are categorically different from traditional shadow IT: data submitted to external LLMs may be used for model training (making exfiltration permanent), AI tools are active rather than passive and can execute tasks with user credentials in agentic configurations, and shadow AI tools are software dependencies that have not been evaluated for supply chain risk. The discovery problem is also harder: standard SaaS discovery tools are not designed to detect locally installed AI tools, code-level API integrations, or MCP server connections in developer environments.

Conclusion#

AI security is a distinct discipline with its own threat taxonomy, attack techniques, and defensive controls. The terms in this glossary describe the actual attack surface that enterprises running AI systems in production need to manage: from prompt injection and jailbreaking at the model interaction layer, to RAG poisoning and embedding attacks at the retrieval layer, to tool abuse and MCP exploitation at the agentic layer, to training data poisoning and backdoors at the supply chain layer.

Precise language produces precise controls. Security teams that understand the distinction between direct and indirect prompt injection design better guardrails. Teams that understand blast radius implement better least-privilege policies. Teams that understand what model scanning does (and does not) cover make better decisions about AI supply chain risk.

This glossary will be updated as the threat landscape evolves. For deeper treatment of individual topics, follow the links throughout to Repello's research.

To learn how Repello tests, monitors, and manages the full AI attack surface for enterprise deployments, request a demo.