Glossary/AI Guardrails

What are AI Guardrails?

AI guardrails are runtime controls — typically classifiers, rule engines, or smaller policy models — that inspect the inputs and outputs of an LLM application to enforce safety, security, and compliance policies the underlying model cannot guarantee on its own. Where the foundation model's safety training is best-effort and probabilistic, guardrails are the deterministic enforcement layer the application owner controls.

What guardrails actually do

A typical guardrail layer sits between the user and the model on input, and between the model and the user on output:

user input → [input guardrail] → LLM → [output guardrail] → response

Input-side guardrails screen for:

Output-side guardrails screen for:

Common guardrail products

What guardrails don't solve

Guardrails are necessary but not sufficient. Three documented limitations:

The right framing: guardrails raise the cost of attack and catch the bottom 90% of probes, but a determined attacker will eventually route around any single layer. Defense-in-depth (input + output + abuse detection + continuous testing) is the real posture.