Glossary/Model Hijacking

What is Model Hijacking?

Model hijacking is an attack where an adversary repurposes a deployed AI model to perform tasks the model owner did not authorize, without retraining or modifying the model. Where prompt injection focuses on the technique of smuggling instructions into the model, hijacking focuses on the outcome: the deployed model becomes unauthorized compute for the attacker, often at the owner's expense.

How model hijacking works

A deployed model has an intended task — answer customer questions about a SaaS product, summarize support tickets, translate text, etc. The model owner pays for inference, defines a system prompt, and exposes the model through a public or semi-public interface (chatbot widget, API, embedded assistant).

A model hijack happens when an attacker uses prompt injection, jailbreak techniques, or context-window exploits to make that model perform a different task entirely. The model is unchanged; the application's framing of it is bypassed; the model now serves the attacker's goal.

The attacker's gain is twofold:

Documented hijack patterns

Why it's a costly attack

Unlike traditional API abuse, hijacked-model traffic looks legitimate at the network and request layer. Each request is a well-formed prompt to a real endpoint. Detection requires looking at the content of the conversation, not the request envelope.

Inference costs for foundation models are non-trivial — a sustained hijack against a high-traffic chatbot can run thousands of dollars per day in unauthorized inference fees, plus reputation damage when the misuse becomes public.

Defending against model hijacking