Glossary/AI Hallucination

What is an AI Hallucination?

An AI hallucination is when a language model produces confident, fluent output that is factually wrong, fabricated, or unsupported by its source material. Hallucinations are not glitches or bugs — they are a structural property of how generative models work, which means they cannot be eliminated, only managed.

Why hallucinations happen

Language models are trained to predict the most plausible next token given the preceding context. They are optimized to produce fluent, contextually-appropriate text — not to be calibrated on truth. When the model lacks specific knowledge to answer correctly, it doesn't know it lacks the knowledge; it generates the most plausible-looking continuation, which often reads as confidently asserted but factually incorrect.

Three structural drivers:

  1. No epistemic boundary — the model doesn't have a reliable signal for "I don't know this." Plausibility and accuracy are different things, and the model is trained on plausibility.
  2. Compression of training data — the model approximates billions of training documents in a fixed-size weight matrix. Specific facts can be partially recovered, partially blurred, partially invented.
  3. Sampling randomness — even when the model "knows" the right answer, sampling temperature and top-k decisions can pull it toward plausible-but-wrong tokens.

Hallucination categories

Why hallucinations matter for security

Hallucinations are not just an accuracy problem — they're a security and trust problem:

Reducing hallucinations

You cannot eliminate them. You can reduce them: