English for Qdrant

Learn the English vocabulary for discussing Qdrant, the open-source vector database, including collections, payloads, and hybrid search with filtering.

Qdrant markets itself on combining fast vector similarity search with structured filtering in the same query, and the vocabulary reflects that dual nature — it’s not just a nearest-neighbor index, it’s a database with metadata as a first-class citizen.

Key Vocabulary

Collection — Qdrant’s top-level organizational unit, analogous to a table, holding a set of vectors of a defined dimensionality along with their associated metadata, configured once with the distance metric to use for similarity. “We created a separate collection for each document type instead of one shared collection, because they use different embedding models with different vector dimensions, and Qdrant requires a consistent dimensionality per collection.”

Payload — the structured metadata attached to each vector in Qdrant, such as a document’s title, category, or timestamp, which can be filtered on directly alongside the vector similarity search itself. “We store the source URL and publish date as payload on each vector — that lets us filter to only recent articles from a specific source, at the same time as doing the similarity search, in a single query.”

Hybrid search — combining vector similarity search with structured payload filtering in one query, such as finding the most semantically similar documents that are also tagged with a specific category, rather than filtering as a separate step. “We’re using hybrid search here — the query finds the most semantically similar support articles, but only among ones payload-tagged as currently published, so unpublished drafts never surface in results even if they’re a close semantic match.”

Distance metric — the mathematical function Qdrant uses to measure similarity between vectors, such as cosine similarity, dot product, or Euclidean distance, chosen at collection creation and required to match how the embedding model was trained. “Search quality was poor until we realized the distance metric was mismatched — the embedding model was trained for cosine similarity, but the collection was configured for Euclidean distance, which doesn’t rank results the way the model actually expects.”

HNSW index — the approximate nearest neighbor algorithm Qdrant uses under the hood for fast similarity search at scale, trading a small amount of search accuracy for search speed that stays fast even as the collection grows to millions of vectors. “We’re relying on the HNSW index’s approximate search here, which is why results are extremely fast even over ten million vectors — it’s not scanning every vector exactly, it’s traversing a graph structure that finds very good, though not always mathematically perfect, matches.”

Common Phrases

  • “Should this be a separate collection, or can it share one with the existing vectors?”
  • “Is this metadata stored as payload, so we can filter on it directly?”
  • “Do we need hybrid search here, or is plain similarity search enough?”
  • “Does the distance metric match what the embedding model was actually trained on?”
  • “Is the HNSW index configured with enough accuracy for this use case?”

Example Sentences

Explaining a schema decision: “We split these into two collections rather than one, because the product embeddings and the review embeddings come from different models with different vector dimensions — Qdrant needs a single consistent dimensionality per collection, so mixing them wasn’t an option.”

Diagnosing a relevance issue: “Search results felt subtly wrong, and it turned out the distance metric didn’t match the embedding model’s training objective — once we recreated the collection with cosine similarity instead of dot product, relevance improved noticeably.”

Describing a filtering requirement: “We need hybrid search for this feature — users should only see results from documents they have access to, which means the query has to combine semantic similarity with a payload filter on the document’s access-control tags, not just similarity alone.”

Professional Tips

  • Design each collection around a consistent embedding model and vector dimensionality — mixing incompatible embeddings in one collection isn’t supported and won’t produce meaningful results anyway.
  • Store filterable, structured attributes as payload rather than trying to encode them into the vector itself — it keeps semantic search and business-logic filtering cleanly separated.
  • Reach for hybrid search whenever results need both semantic relevance and a hard constraint, like access control or category — filtering after the fact is both slower and less correct.
  • Always match the distance metric to what the embedding model was actually trained with — a mismatch silently degrades relevance without throwing any errors.
  • Understand that the HNSW index is approximate, not exact — for most applications the tiny accuracy tradeoff is worth the massive speed gain, but tune its parameters if your use case genuinely needs near-exact recall.

Practice Exercise

  1. Explain the difference between a vector and its payload in Qdrant.
  2. Describe a scenario where hybrid search is necessary instead of plain similarity search.
  3. Write a sentence explaining why matching the distance metric to the embedding model matters.