What is a Vector Database?

A vector database is a data store optimized for high-dimensional embeddings: it indexes vectors so that nearest-neighbor queries (find the k vectors most similar to this one) return results in milliseconds rather than the seconds a naive scan would require. Vector databases are the substrate for RAG pipelines, semantic search, recommendation engines, and any system where similarity in embedding space is the access pattern.

How vector databases work

The core operation is approximate nearest neighbor (ANN) search. Exact nearest-neighbor on millions of high-dimensional vectors is expensive — every query would need to compare against every stored vector. ANN algorithms trade a small accuracy loss for orders-of-magnitude speedup.

Common ANN algorithms:

HNSW (Hierarchical Navigable Small World) — graph-based, excellent recall, used by most modern stores
IVF (Inverted File Index) — partition vectors into clusters, search only relevant clusters
Product Quantization — compress vectors so more fit in RAM, trade precision for footprint
DiskANN, ScaNN, FAISS — research-backed implementations underpinning many products

Vector databases also store metadata alongside each vector (document text, source URL, timestamps, ACL tags) and support filtered queries — "find me documents similar to this one, but only those tagged engineering and modified in the last 30 days."

Common vector databases

Cloud-native — Pinecone, Weaviate Cloud, Qdrant Cloud, Vespa Cloud
Self-hosted open source — Chroma, Qdrant, Weaviate, Milvus, Vespa
Postgres extensions — pgvector (most popular), Lantern
Existing data store extensions — Redis, Elasticsearch, MongoDB Atlas all added vector support
Embedded / local — LanceDB, Chroma, sqlite-vec (for prototypes and edge deployments)

Security implications

Vector databases inherit the security concerns of any data store, plus a few specific to vector data:

The corpus itself is sensitive. If the database holds embeddings of internal documents, the embeddings are reversible enough (via inversion attacks) that they should be treated like the source content for access control purposes.
Metadata-filter bypass — when ACL is enforced via metadata filters at query time, errors in filter logic can leak vectors the user shouldn't see.
Embedding-space poisoning — attackers who can write to the database (e.g. via user-generated content that gets indexed) can craft adversarial documents that get retrieved on target queries, weaponizing the retrieval layer for prompt injection.
Side-channel via query latency — in some configurations, the time taken to return results leaks information about what's in the corpus.

Securing a vector database requires per-query ACL enforcement, write-side validation of indexed content, and audit logging of retrievals against sensitive partitions.

What is a Vector Database?

How vector databases work

Common vector databases

Security implications

Long-form on this topic from the Repello blog