English for Milvus Vector Database
Learn the English vocabulary for Milvus, the open-source vector database: collections, indexes, ANN search, and consistency levels.
Milvus conversations mix database vocabulary (collections, partitions, consistency) with search-specific terms (index type, recall, nprobe), and mixing them up — calling an index type a “search mode” or a partition a “shard” — makes it harder for teammates to follow a performance discussion precisely.
Key Vocabulary
Collection — the top-level container in Milvus that holds vectors and their associated scalar fields, roughly analogous to a table in a relational database. “We keep product embeddings and support-ticket embeddings in separate collections since they’re queried independently and have different schemas.”
Index type (HNSW, IVF_FLAT, DiskANN) — the algorithm used to organize vectors for approximate nearest-neighbor search, chosen based on the trade-off between query latency, recall, and memory footprint.
“We switched from IVF_FLAT to HNSW because query latency at our scale mattered more than the extra memory it uses.”
Recall — the fraction of true nearest neighbors that an approximate search actually returns, used to measure how much accuracy is being traded for speed.
“Recall dropped to 85% after we lowered ef for faster queries — we need to find the right point on that latency-recall curve.”
Partition — a logical subdivision within a collection that lets you scope a search to a subset of the data, improving both organization and query performance. “We partition by tenant ID so a search for one customer’s data never has to scan vectors belonging to another.”
Consistency level — a setting that controls how up-to-date a query’s view of recently written data must be, ranging from strong consistency to eventual, bounded, or session-level guarantees. “We’re using bounded consistency for the search endpoint — a few seconds of staleness is fine, and it’s much faster than requiring strong consistency on every query.”
Common Phrases
- “Is this collection indexed with
HNSW, or are we still doing a brute-force flat scan?” - “What recall are we seeing at the current
efsetting — is it worth trading some latency to get it higher?” - “Should this data live in its own partition, or is a metadata filter on the existing collection enough?”
- “Are we running under strong consistency here, or can this query tolerate some staleness?”
- “Is the index built and loaded, or are we still inserting into an unindexed collection?”
Example Sentences
Reviewing a search latency issue: “The p99 latency spike traces back to an unindexed partition — the new data was inserted but the index build hadn’t finished before we started querying it.”
Explaining an indexing decision:
“We chose IVF_FLAT over HNSW for this collection because build time mattered more than query speed, and the dataset is small enough that the difference is negligible.”
Describing a consistency trade-off in a design review: “Session consistency is enough here — a user always sees their own writes immediately, and other users’ slight staleness doesn’t affect correctness for this feature.”
Professional Tips
- Name the specific index type when discussing performance — “search is slow” is far less actionable than “we’re on
IVF_FLATwith a highnprobe, which trades speed for recall.” - Distinguish recall from raw accuracy explicitly — a vector search “returning wrong results” is often actually returning correct approximate neighbors, just not the exact top-k.
- Use partition precisely, not interchangeably with “collection” or “shard” — conflating them in a design doc causes real confusion about how data is actually organized.
- State the consistency level a feature requires up front — deciding this late, after the query pattern is built, often forces a rewrite.
Practice Exercise
- Explain the trade-off between
HNSWandIVF_FLATin one sentence. - Describe what recall measures and why 100% recall usually isn’t the goal.
- Write a sentence explaining when you’d use a partition versus a separate collection.