5 exercises on vector databases — similarity search, HNSW indexes, ANN, cosine similarity, and namespaces.
0 / 5 completed
1 / 5
What is similarity search in a vector database?
Similarity search: a query embedding (e.g. a sentence encoded by a transformer) is compared against millions of stored embeddings. The database returns the top-k nearest neighbours — documents, images, or products — whose embeddings are closest in the vector space.
2 / 5
What does HNSW stand for and what is it?
HNSW: builds a multi-layer graph where upper layers are sparse long-range connections and lower layers are dense local connections. Query time is O(log N) in practice. Used by Pinecone, Qdrant, Weaviate, and pgvector for high-performance ANN search.
3 / 5
What is ANN (Approximate Nearest Neighbour) search and why is it used instead of exact search?
ANN: exact nearest neighbour search in high-dimensional spaces requires comparing every vector (linear scan). ANN algorithms like HNSW or IVFFlat skip unpromising regions, returning results that are very close to exact (recall ~95–99%) in milliseconds even over billions of vectors.
4 / 5
What does cosine similarity measure?
Cosine similarity: ranges from -1 (opposite direction) to 1 (same direction). It is preferred for text embeddings because magnitude (document length) is ignored — two semantically identical short and long documents have the same direction even if their magnitudes differ.
5 / 5
What is a namespace (or collection) in a vector database?
Namespace / collection: Pinecone calls them namespaces; Qdrant and Weaviate call them collections. They partition vectors so a search is scoped to one dataset. Common use cases: one namespace per customer (multi-tenancy) or one per document corpus type.