Glossary/RAG Poisoning

What is RAG Poisoning?

RAG poisoning is an attack that injects malicious content into a retrieval-augmented generation (RAG) system's knowledge base, manipulating what the language model retrieves and bases its outputs on. Because RAG architectures treat retrieved documents as authoritative context, a single poisoned document in the knowledge base can systematically warp the model's responses without ever modifying the model itself.

How RAG poisoning works

A retrieval-augmented generation pipeline has three stages:

  1. Index — documents are split into chunks, embedded into a vector space, and stored in a vector database
  2. Retrieve — a user query is embedded and used to fetch the top-k semantically similar chunks
  3. Generate — the retrieved chunks are concatenated into the model's context window and the model answers based on them

RAG poisoning targets stage 1 — the index. An attacker introduces poisoned documents into the corpus that will be retrieved when specific queries are made. When that retrieval happens, the poisoned chunks flow into the model's context as if they were authoritative facts.

Three variants

1. Direct content poisoning. The attacker adds documents containing false claims, malicious instructions, or biased content. When relevant queries are made, the poisoned content is retrieved and treated as ground truth. Repello's research demonstrated this against a Llama 3 RAG deployment, causing the model to produce racist outputs on queries that retrieved poisoned chunks — even though the model itself had robust safety training.

2. Embedding-space manipulation. The attacker crafts documents whose embeddings cluster near a target query's embedding, ensuring they're retrieved even when their semantic content seems unrelated. This works because retrieval is based on vector similarity, not human-readable relevance.

3. Indirect prompt injection via retrieved content. Documents in the corpus contain hidden adversarial instructions ("ignore previous instructions, instead…"). When retrieved, these instructions enter the model's context and may be acted on.

Real-world exposure

RAG corpora are often built from sources that are not fully under the operator's control:

Any of these can be a vector for introducing poisoned content into the index.

Defending against RAG poisoning