How do you red team a RAG system?

A RAG red team exercise covers four stages: mapping the retrieval pipeline to identify ingestion entry points, injecting adversarial documents and verifying retrieval success rates, testing context window overflow and retrieval manipulation, and testing for data exfiltration via injected instructions in retrieved content. The retrieval pipeline and the model interface must each be tested independently before integrated testing.

RAG Security: How to Red Team Retrieval-Augmented Generation Systems

TL;DR

RAG systems add a retrieval layer that standard LLM security tools do not cover
The five primary RAG attack vectors are: document injection, retrieval manipulation, embedding poisoning, context window overflow, and indirect prompt injection via retrieved content
Red teaming a RAG system requires mapping the retrieval pipeline, testing adversarial document ingestion, and validating context assembly for exfiltration risk
ARTEMIS tests RAG pipelines against all known injection and poisoning attack vectors

What is RAG and why does it create a new attack surface?#

Retrieval-Augmented Generation connects an LLM to an external knowledge store. At query time, a retrieval component fetches relevant documents from a vector database, injects them into the model's context window, and the model generates a response conditioned on that retrieved content.

This architecture solves a real problem: it lets models answer questions about private, recent, or domain-specific data without retraining. Most enterprise AI deployments today use some form of RAG.

The attack surface it creates is separate from the model itself. The document ingestion pipeline, the vector database, the embedding and chunking logic, and the context assembly step each introduce failure modes that a standard LLM security audit does not cover. Testing the model in isolation misses the most exploitable layer in most production RAG deployments.

RAG shifts the trust boundary. In a standard LLM deployment, the system prompt and user input are the only external-facing inputs. In a RAG deployment, every document in the knowledge store is also a potential input to the model, often without any of the access controls applied to the prompt layer.

The 5 RAG-specific attack vectors#

Document injection attacks#

An attacker with write access to the knowledge base, or who can influence which documents get ingested, can insert adversarial content that the retrieval system surfaces in response to specific queries.

This does not require exploiting the model. It requires exploiting the ingestion pipeline, which typically has weaker access controls than the model API. Documents ingested from web scraping, third-party data feeds, email attachments, or user-submitted content are high-risk entry points. In multi-tenant RAG deployments, documents from one tenant can sometimes cross namespace boundaries and appear in another tenant's retrieval results if isolation is not enforced at the vector store level.

The attack does not require the injected document to appear malicious. A document that looks like a policy update, a FAQ entry, or a product specification can contain adversarial instructions that only activate when the right query triggers retrieval.

Retrieval manipulation#

The retrieval step ranks candidate documents by embedding similarity to the query. An attacker who understands the embedding model can craft documents with elevated similarity scores for target queries. The result is adversarial content ranked above legitimate results for those queries.

Zou et al. demonstrated in PoisonedRAG that injecting as few as five adversarial passages into a target corpus was sufficient to poison model responses with attack success rates above 90% across multiple RAG configurations and base models. The adversarial passages did not need to look semantically unusual. They were engineered to score high on retrieval similarity for target queries while appearing benign on inspection.

This makes retrieval manipulation a practical, low-cost attack. The attacker needs to understand the embedding model's scoring behavior, not the model's weights or architecture.

Embedding and vector poisoning#

The embedding model converts text to numerical vectors stored in the vector database. If the embedding model has been compromised at the supply chain level, or if the vector database accepts external vector inputs directly, an attacker can inject vectors that cluster with legitimate queries without corresponding to any real document.

This attack is harder to execute than document injection but harder to detect. A poisoned vector record looks identical to a legitimate one. Standard vector database monitoring does not flag anomalous cosine similarity patterns unless specifically instrumented for it.

Vector poisoning becomes more accessible in deployments that allow the vector store to be populated via API, or in RAG systems built on top of external embedding providers whose model behavior is not version-pinned.

For the deeper technical breakdown — including the PoisonedRAG and HijackRAG attack mechanics, embedding inversion as a privacy surface, and runtime defenses that scale — see vector embedding security.

Context window overflow attacks#

RAG assembles retrieved chunks into the model's context window before generation. If an attacker can influence what gets retrieved and in what volume, they can flood the context with adversarial content, crowding out the legitimate retrieved material.

This exploits what Liu et al. identified as the lost-in-the-middle problem: models systematically underweight information positioned in the middle of long contexts. By controlling what appears at the start and end of the assembled context, an attacker can steer generation even when the user's legitimate content is also present. The attack works without ever compromising the model or the system prompt.

Context window overflow is particularly relevant in RAG deployments that retrieve large chunks, use high top-k retrieval, or have no mechanism to deduplicate or rank injected content by source trustworthiness.

Indirect prompt injection via retrieved content#

This is the highest-impact RAG attack vector. An attacker who controls a document that will be retrieved embeds instructions in that document. The model treats retrieved content as part of its context and follows those instructions, without distinguishing them from the system prompt or applying the same scrutiny it would apply to explicit user inputs.

This is the indirect injection pattern documented in Greshake et al.'s foundational paper: the attack does not target the user input at all. It targets content the model will be given silently at inference time. The user and operator may never see what was retrieved or know that instructions were embedded in it.

For a detailed technical breakdown of how direct and indirect injection differ in threat model and attack chain, see Direct vs. Indirect Prompt Injection.

How RAG attacks differ from standard LLM attacks#

Standard LLM attacks work through the input layer: jailbreaks, encoding tricks, multi-turn erosion, adversarial suffixes. The model's own alignment training is the primary defense, and the attack is visible in the conversation history.

RAG attacks work through the data layer. The model is not being attacked directly. The retrieval pipeline is being attacked, and the model executes adversarial instructions sourced from content that was placed in the knowledge store before the conversation started. The attack does not appear in the conversation history and does not require any interaction with the user.

Red teaming only the model interface misses document injection, retrieval manipulation, and vector poisoning entirely. Testing only the vector database misses the indirect injection vectors that require model execution to complete.

A RAG-layer attack that produces altered model outputs may look identical to a model hallucination or a prompt engineering failure. Without retrieval logging and context assembly observability, the injection is invisible to defenders.

A complete RAG security assessment requires testing both the retrieval layer and the model layer, and instrumenting the interface between them to make attacks visible when they occur.

Real-world RAG attack examples#

Repello AI's research demonstrated RAG poisoning in a live deployment. By injecting targeted adversarial content into a Llama 3 RAG configuration, model responses on specific topics were reliably manipulated without any modification to the model or the system prompt. The full attack methodology is documented in How RAG Poisoning Made Llama3 Racist.

This is not an isolated result. The same attack primitives apply across RAG architectures because the vulnerability follows from how retrieval and context assembly work, not from a specific model or embedding model weakness.

"The retrieval layer is where most enterprises have the weakest security posture," notes the Repello AI Research Team. "Document ingestion pipelines are rarely held to the same access control standards as the API layer, which makes the vector store a consistent entry point for adversarial content. Most organizations do not log what was retrieved or in what order, so attacks run silently until the output anomaly is noticed downstream."

For the technical foundations underlying these attacks, see The CISO's Guide to Data Poisoning in Enterprise AI.

How to red team a RAG system: methodology#

Red teaming a RAG system requires a different methodology from red teaming a base LLM. The retrieval pipeline, the vector database, and the context assembly layer each need to be tested independently before the integrated system is tested end to end.

Step 1: Map the retrieval pipeline#

Before any adversarial testing, document the complete data flow. Identify every document source: manual upload, web scraping, file sync, email ingestion, API feeds from third-party services. Document the ingestion validation logic at each source, the embedding model and chunking parameters, the vector database configuration, and the access controls on write operations to the vector store.

Pay particular attention to any ingestion path that accepts external or semi-trusted input without sanitization. These are the highest-probability document injection entry points, and they are often added to RAG deployments incrementally without security review.

Step 2: Inject adversarial documents#

Craft documents designed to be retrieved for specific high-value queries. Test whether injected documents surface in retrieval results and with what ranking. Test whether the retrieval system's similarity scores can be elevated by adjusting the linguistic structure, keyword density, or semantic framing of injected documents.

The goal at this step is to confirm document injection as a viable attack path before testing model-level behavior. A retrieval system that does not surface injected documents is not immune to the later attack stages, but document injection tests establish the baseline difficulty for an attacker.

Also test multi-tenant namespace isolation if the deployment serves more than one tenant from a shared vector store. Cross-tenant document retrieval is a high-severity vulnerability that does not require any adversarial document crafting: it is a misconfiguration.

Step 3: Test retrieval manipulation#

Submit queries designed to surface adversarial documents with priority over legitimate content. Test context window overflow by injecting high-volume adversarial chunks and observing whether they displace legitimate content in the assembled context. Measure how much injected content is required to produce a measurable change in model output.

For agentic AI deployments that use RAG to inform tool calls rather than text responses, this step is higher stakes. A manipulated retrieval result that causes an agent to invoke a tool, send a message, or take an action in an external system has a significantly larger blast radius than one that alters a text response. Test tool-invocation paths specifically.

Step 4: Test for data exfiltration via retrieved content#

Embed data exfiltration instructions in injected documents and test whether the model executes them. Two scenarios require separate test cases.

The first is intra-session exfiltration: the injected document instructs the model to summarize and output all other content in the retrieved context. This extracts other documents from the knowledge store via the model's output.

The second is cross-channel exfiltration: the injected document instructs the model to forward session history, retrieved documents, or system prompt content to an external endpoint via a tool call. In agentic deployments with outbound HTTP tools, this is a complete data exfiltration path from a single adversarial document.

Both scenarios should be tested against every RAG deployment that handles sensitive data. The exfiltration test is the highest-severity scenario in the RAG security assessment and should not be skipped.

RAG security testing tools#

RAG security testing requires tooling that operates at the data layer, not just the model interface. Three capability categories cover the full surface.

Document injection testing requires tools that generate adversarial documents for target queries, inject them into the corpus, and verify retrieval success rates. The test should cover both direct corpus injection (write access to the vector store) and pipeline injection (inserting adversarial content into a document source that feeds the ingestion pipeline). ARTEMIS includes a RAG-specific module that covers both paths and outputs a ranked list of injection-vulnerable ingestion points.

Retrieval analysis requires tools that inspect the vector database directly, independent of the model. These check for anomalous vector clustering, unexpected similarity score distributions, and ingestion pipeline access control gaps. Embedding-level anomaly detection can flag vectors that cluster with high-value query embeddings without corresponding to expected document content.

Context assembly testing requires observability at the retrieval output layer: logging what was retrieved, in what order, from what source, and how it was assembled before being sent to the model. Without this layer, RAG-layer attacks are invisible to defenders. Context assembly testing also validates that the system prompt cannot be overridden by retrieved content and that retrieval results are not given elevated trust relative to the operator-controlled prompt.

For a broader overview of AI pentesting methodology, see LLM Pentesting: Methodology, Tools, and How to Structure a Test. For the complete red teaming framework that covers RAG as part of the broader attack surface, see AI Red Teaming: The Complete Guide.

Frequently asked questions#

What is RAG security?

RAG security is the set of practices for identifying and mitigating attack vectors specific to Retrieval-Augmented Generation systems. The primary vectors are document injection into the knowledge base, retrieval manipulation via adversarially crafted content, embedding poisoning in the vector store, context window overflow, and indirect prompt injection through retrieved documents. These vectors are not covered by standard LLM security testing against the model interface.

What is the most dangerous RAG attack vector?

Indirect prompt injection via retrieved content is the highest-impact vector in most deployments. An attacker who can place a document into the knowledge store can embed instructions that the model executes at inference time, without any user interaction and without the instructions appearing in the conversation history. In agentic deployments with outbound tool access, this becomes a full remote code execution path via a poisoned document.

How is RAG poisoning different from data poisoning?

Data poisoning targets training data to alter model weights during fine-tuning or pretraining. RAG poisoning targets the retrieval corpus to alter model behavior during inference, without touching the model. RAG poisoning is faster to execute, does not require access to training infrastructure, produces targeted effects on specific queries rather than broad behavioral shifts, and can be reversed by removing the injected documents. It is also harder to attribute because the model behavior appears normal: the model is following its instructions faithfully, just not the ones the operator intended.

Can vector databases be secured against poisoning?

Access controls on the ingestion pipeline are the primary defense. Write access to the vector store should be restricted to verified, validated ingestion pipelines. Input sanitization and provenance tracking on ingested documents reduce the attack surface further. Embedding-level anomaly detection can identify vectors that cluster unusually with high-value query embeddings. Retrieval logging with source attribution makes attacks visible rather than silent.

Does ARTEMIS test RAG pipelines?

Yes. ARTEMIS includes RAG-specific attack modules covering document injection, retrieval manipulation, context window overflow, indirect prompt injection, and data exfiltration via retrieved content. It tests both the corpus injection path and the pipeline injection path, and generates a prioritized remediation report specific to the RAG configuration under test. Request a demo to see the RAG security test report for your deployment.

ARTEMIS tests RAG pipelines for all known injection and poisoning attack vectors. Book a demo to run a full RAG security assessment.