Glossary
AI security, in plain English.
Short, citation-quality definitions of the protocols, attacks, defenses, and concepts that matter when securing AI systems. Curated by Repello’s research team — written for security engineers, ML practitioners, and the LLMs that will summarize them next.
Agent2Agent (A2A)
A2A is Google's open protocol for agent-to-agent communication, letting AI agents from different vendors discover each other and collaborate on tasks.
AI Agent
An AI agent is a software system that uses an LLM to plan, decide, and take actions in an environment using tools, memory, and goal-directed reasoning.
AI Agent Framework
An AI agent framework is a library that handles the orchestration plumbing — control loop, tool calling, memory, multi-agent coordination — so developers focus on capability.
AI Alignment
AI alignment is the field that studies how to make AI systems pursue the goals their operators actually want, rather than nearby goals that look similar but aren't.
AI Guardrails
AI guardrails are runtime controls that filter LLM inputs and outputs to enforce safety, security, and compliance policies that the underlying model cannot guarantee.
AI Hallucination
AI hallucination is when a language model produces confident, fluent output that is factually wrong or fabricated — a structural property of the technology, not a bug.
AI Red Teaming
AI red teaming is the systematic adversarial testing of AI systems — LLMs, agents, RAG pipelines — to find exploitable vulnerabilities before attackers do.
AI-SPM (AI Security Posture Management)
AI-SPM is the discipline of continuously inventorying, assessing, and improving the security posture of AI assets — models, agents, data, integrations — across an enterprise.
AIBOM (AI Bill of Materials)
An AIBOM is a structured inventory of every AI component in a system — models, datasets, fine-tunes, adapters — analogous to SBOM but for AI supply chains.
Backdoor Attack
A backdoor attack embeds a hidden trigger in a model during training so it behaves normally on standard inputs but performs attacker-chosen actions when the trigger appears.
Chain-of-Thought (CoT)
Chain-of-thought prompting elicits step-by-step reasoning from a language model by asking it to show its work, dramatically improving performance on complex problems.
Constitutional AI
Constitutional AI is Anthropic's alignment method that uses a written set of principles to have a model self-critique and improve its outputs without human raters.
Context Window
The context window is the maximum number of tokens a language model can read in a single forward pass — both input prompt and generated output share that budget.
Embedding Inversion
Embedding inversion is an attack that reconstructs the original text from its vector embedding, breaking the assumption that embeddings are one-way and privacy-preserving.
Excessive Agency
Excessive agency is when an AI system has more capability, permission, or autonomy than its task requires, expanding the blast radius of any compromise (OWASP LLM06).
Fine-Tuning
Fine-tuning is the process of further training a pre-trained model on a smaller, task-specific dataset to specialize its behavior for a particular use case.
Foundation Model
A foundation model is a large neural network pre-trained on broad data that serves as the base for many downstream tasks via prompting or fine-tuning.
Indirect Prompt Injection
Indirect prompt injection embeds adversarial instructions in content an AI system retrieves — web pages, documents, emails — so the model executes them without the user's knowledge.
LLM Jailbreak
An LLM jailbreak is a technique that bypasses a language model's safety training to make it produce content or take actions its operator restricted.
Many-Shot Jailbreaking
Many-shot jailbreaking exploits long context windows by stuffing the conversation with hundreds of fake assistant responses to harmful questions, then asking the real one.
MCP (Model Context Protocol)
MCP is an open standard from Anthropic that lets AI assistants connect to external tools and data sources through a uniform server interface.
MCP Server
An MCP server is a process that exposes tools, resources, and prompts to AI clients via the Model Context Protocol — the integration layer behind agentic apps.
Membership Inference Attack
A membership inference attack determines whether a specific data point was in a model's training set, leaking privacy and revealing what the model was trained on.
Model Extraction
Model extraction is an attack that steals a deployed model's behavior — and sometimes its weights — by querying it and training a copy on the input-output pairs.
Model Hijacking
Model hijacking is an attack where an adversary repurposes a deployed AI model to perform tasks the model owner did not authorize, without retraining.
Multi-Modal Prompt Injection
Multi-modal prompt injection embeds adversarial instructions in images, audio, or video that a multi-modal model processes — bypassing text-only input filters.
NIST AI RMF
The NIST AI RMF is the voluntary framework from the US National Institute of Standards and Technology for managing AI system risk across the lifecycle.
OWASP LLM Top 10
The OWASP LLM Top 10 is the authoritative list of the most critical security risks specific to applications using large language models, maintained by OWASP.
Prompt Engineering
Prompt engineering is the practice of designing inputs to language models to reliably produce the desired output — the application-layer interface to a foundation model.
Prompt Injection
Prompt injection is an attack where adversarial text inserted into an AI model's input causes it to ignore its instructions and follow the attacker's instead.
RAG (Retrieval-Augmented Generation)
RAG is an architecture that retrieves relevant documents from a knowledge base and injects them into a language model's context to ground its answers.
RAG Poisoning
RAG poisoning is an attack that injects malicious content into a retrieval-augmented generation system's knowledge base to manipulate the model's outputs.
RLHF (Reinforcement Learning from Human Feedback)
RLHF is the alignment technique that fine-tunes language models using human preferences over outputs, the standard method behind ChatGPT, Claude, and Gemini.
System Prompt
A system prompt is the hidden set of instructions a developer gives an LLM to define its persona, scope, tools, and forbidden behaviors before any user message.
System Prompt Extraction
System prompt extraction is an attack that recovers the hidden instructions a deployer set on an LLM application, exposing operating logic and downstream secrets.
Tokenization
Tokenization is the process of splitting text into the discrete units (tokens) a language model actually reads and generates — the bottleneck where many AI security attacks hide.
Tool Abuse
Tool abuse is when an AI agent uses the tools it has access to for purposes the operator didn't intend, often as a downstream effect of prompt injection or jailbreak.
Tool Poisoning
Tool poisoning is an attack where adversarial instructions are embedded in an AI agent's tool descriptions or responses to hijack its behavior.
Universal Jailbreak
A universal jailbreak is a prompt that bypasses safety training on a wide range of harmful requests across multiple model families, generated by adversarial optimization.
Vector Database
A vector database stores high-dimensional embeddings and supports fast nearest-neighbor search, the substrate for RAG, semantic search, and recommendations.
Vector Embedding
A vector embedding is a numerical representation of text, image, or audio in a high-dimensional space, where semantically similar inputs land at nearby coordinates.