Glossary/Vector Database

What is a Vector Database?

A vector database is a data store optimized for high-dimensional embeddings: it indexes vectors so that nearest-neighbor queries (find the k vectors most similar to this one) return results in milliseconds rather than the seconds a naive scan would require. Vector databases are the substrate for RAG pipelines, semantic search, recommendation engines, and any system where similarity in embedding space is the access pattern.

How vector databases work

The core operation is approximate nearest neighbor (ANN) search. Exact nearest-neighbor on millions of high-dimensional vectors is expensive — every query would need to compare against every stored vector. ANN algorithms trade a small accuracy loss for orders-of-magnitude speedup.

Common ANN algorithms:

Vector databases also store metadata alongside each vector (document text, source URL, timestamps, ACL tags) and support filtered queries — "find me documents similar to this one, but only those tagged engineering and modified in the last 30 days."

Common vector databases

Security implications

Vector databases inherit the security concerns of any data store, plus a few specific to vector data:

Securing a vector database requires per-query ACL enforcement, write-side validation of indexed content, and audit logging of retrievals against sensitive partitions.