Vector Database Vocabulary: Embeddings, Search, and Similarity Explained
Vector embedding, semantic search, HNSW, ANN, cosine similarity, RAG pipeline — the vocabulary you need to work with vector databases and AI-powered search in English.
Vector databases are at the heart of modern AI applications — from semantic search to retrieval-augmented generation (RAG). As these technologies move from research into production, the vocabulary around them becomes essential for engineers writing design docs, reviewing PRs, and discussing architecture with data scientists. Here is the core terminology explained clearly.
Vectors and Embeddings
Vector embedding — A numerical representation of content (text, image, audio, code) as a list of floating-point numbers — a vector — in a high-dimensional space. Similar content produces similar vectors. Phrase: “We use an embedding model to convert product descriptions into 1,536-dimensional vectors and store them in the vector database.”
Embedding model — A machine learning model that produces vector embeddings. Examples: OpenAI text-embedding-3-small, Cohere Embed, sentence-transformers. The same model must be used for both storing and querying embeddings — mixing models breaks search.
Chunking strategy — The method used to split long documents into smaller pieces before embedding. Common strategies: fixed-size chunks, sentence-level chunks, paragraph-level chunks, semantic chunking. Phrase: “We tested three chunking strategies — semantic chunking gave the best retrieval precision for long legal documents.”
Search and Similarity
Semantic search — Finding results based on meaning rather than exact keyword matches. A query “how to cancel my subscription” finds documents about “ending a membership” even if the exact words don’t match. Phrase: “We replaced keyword search with semantic search — user satisfaction scores improved significantly for long-tail queries.”
Similarity search — Searching for vectors that are closest to a query vector according to a distance or similarity metric. Also called vector search or nearest neighbour search.
Cosine similarity — A metric measuring the angle between two vectors. Values range from -1 to 1; values closer to 1 mean more similar. Most commonly used for text embeddings. Phrase: “Cosine similarity works well for text embeddings because it ignores magnitude — only direction matters.”
Dot product similarity — Another similarity metric: the sum of element-wise products. For normalised vectors it is equivalent to cosine similarity. Some vector databases use dot product for performance reasons.
ANN (Approximate Nearest Neighbour) — An algorithm that finds vectors approximately closest to a query, trading a small amount of accuracy for dramatically faster search. Essential at scale — exact nearest neighbour search is too slow for millions of vectors.
HNSW (Hierarchical Navigable Small World) (pronunciation: each letter individually: H-N-S-W) — The most widely used ANN index algorithm. Builds a multi-layer graph structure enabling fast approximate search. Phrase: “HNSW gives sub-millisecond search latency on a million-vector index — much faster than a flat brute-force search.”
Vector Database Operations
Upsert — A combined insert-or-update operation: if a vector with the given ID exists, update it; otherwise insert it. The standard way to add vectors to most vector databases. Phrase: “We upsert product embeddings whenever the catalogue updates — IDs are the product SKU.”
Namespace — A logical partition within a vector database index that separates vectors from different sources or tenants. Phrase: “Each customer gets their own namespace — queries only search within the customer’s namespace.”
Metadata filtering — Filtering vector search results by attached metadata (e.g. category, date, language) before or after the vector search. Phrase: “We filter by language: 'en' and category: 'support' before the vector search — it narrows the candidate set and improves precision.”
Re-ranking — A post-retrieval step that re-orders the initial vector search results using a more accurate (and more expensive) model. Phrase: “We retrieve the top 50 candidates by vector similarity, then re-rank them with a cross-encoder — precision improved significantly.”
RAG Pipeline
RAG (Retrieval-Augmented Generation) pipeline — An architecture where a vector search retrieves relevant context documents, which are then provided to a language model as part of the prompt. Reduces hallucinations and enables answers grounded in your data. Phrase: “Our RAG pipeline retrieves the top 5 chunks from the knowledge base and injects them into the LLM prompt.”
Vector index — The data structure that enables fast similarity search over stored embeddings. An HNSW index, IVF (Inverted File Index), or flat index. Phrase: “Rebuilding the vector index after schema changes takes about 20 minutes on our dataset size.”
Practice: Set up a minimal vector search using a free tier of Pinecone, Weaviate, or Qdrant. Embed ten sentences, store them, and query for the most similar sentence to a new input. Then write a short technical explanation of what you built using the vocabulary from this post.