Glossary/Vector Embedding

What is a Vector Embedding?

A vector embedding is a numerical representation of a piece of content — text, image, audio, code — as a list of floating-point numbers in a high-dimensional space, arranged so that semantically similar inputs land at nearby coordinates. Embeddings are the primary mechanism for similarity search, retrieval-augmented generation, recommendation systems, and clustering in modern AI applications.

How embeddings work

An embedding model is a neural network trained to map inputs to fixed-dimensional vectors (commonly 768, 1024, 1536, or 3072 dimensions). The model is trained so that pairs of inputs with similar meaning produce vectors with high cosine similarity, while unrelated inputs produce vectors that are nearly orthogonal.

The vector itself isn't human-readable — its coordinates have no individual meaning. The geometry is the meaning: distances and angles between vectors encode semantic relationships.

Common embedding models:

Where embeddings are used

Security implications

Embeddings are not one-way functions. Three concrete risks:

For long-form coverage including specific defenses, see Repello's research on vector embedding security (linked below).