Glossary/Context Window

What is an LLM Context Window?

The context window is the maximum number of tokens a language model can attend to in a single forward pass. It bounds everything the model can "see" at once: the system prompt, the conversation history, retrieved documents, tool definitions, tool responses, and the response the model is generating. Bigger context window = the model can reason over more information at once. Smaller context window = sharper attention, less compute per token.

Token counts vs. characters or words

Tokens are the units the model actually processes. They roughly correspond to:

A modern model with a 200K-token context window can hold roughly a 500-page book at once.

Current model context windows (2026)

Model familyContext window
Claude Sonnet 4.6, Opus 4.6200K (1M for 4.5 in special modes)
GPT-5.2256K
Gemini 2.51M (2M experimental)
Open-source (Llama 3.1, Mistral)128K typical

Numbers update fast. The trend is up.

Why context windows matter for security

Three security implications:

  1. Larger windows = larger attack surface. Indirect prompt injection scales with how much retrieved content the model reads. A 1M-token window means a 1M-token attack surface per turn.

  2. Context-stuffing attacks. Pad the context with hundreds of fake assistant turns showing harmful answers ("many-shot jailbreaking"), then ask the real question. Larger windows make this attack more practical.

  3. Lost-in-the-middle and edge attacks. Models attend non-uniformly to context — typically strongest at the very start and very end, weakest in the middle. Attackers can place injection payloads in the high-attention zones (top of system prompt, end of latest tool response) for maximum effect.

Practical limits

The advertised window is the maximum. In practice:

For RAG pipelines, the right number of retrieved chunks is rarely "as many as fit" — it's "the smallest number that contains the answer," because relevance density beats raw token count.