Glossary/Membership Inference Attack

What is a Membership Inference Attack?

A membership inference attack (MIA) is a privacy attack that determines whether a specific data point was part of a model's training set. Given a candidate input and access to the model, the attacker exploits the fact that models behave subtly differently on data they were trained on versus data they were not — and uses that difference to reveal training-set membership. It is the oldest formally-studied AI privacy attack and the foundation for many more dangerous derivatives.

Why models leak training-set membership

Trained models tend to be more confident on training examples than on novel ones. This shows up as:

A membership inference attack measures these differences for a candidate input and compares against a calibration set to determine: was this in training, or wasn't it?

Why MIA matters

Confirming training-set membership is itself a privacy violation:

MIA also underpins more invasive attacks:

Variants

For LLMs specifically, recent research (Carlini et al., Mireshghallah et al.) has shown MIA is harder against modern frontier models than against older smaller ones — but still possible, especially for outlier training examples.

Defenses