Milvus is a distributed vector database designed for billion-scale similarity search. These exercises cover Milvus's segment storage model, IVF_FLAT clustering with nlist, partition-based data isolation, HNSW vs. IVF tradeoffs, and cosine similarity as a distance metric.
0 / 5 completed
1 / 5
In Milvus, what is a Segment and why does it matter for performance?
Milvus Segments are the internal storage units within a collection. Data is first written to growing segments (like a WAL buffer), then sealed into sealed segments when full. Small sealed segments are automatically merged (compacted) into larger ones to optimize index build efficiency and search performance.
2 / 5
A developer creates a Milvus index with index_type='IVF_FLAT', params={'nlist': 128}. What does nlist control?
nlist in IVF (Inverted File) indexes specifies the number of cluster centroids. During index build, vectors are assigned to their nearest centroid. During search, only the closest nprobe clusters are searched, trading recall for speed. More clusters = finer partitioning = better recall at higher nprobe values.
3 / 5
What is the purpose of Milvus Partitions?
Milvus Partitions divide a collection into logical subsets (e.g., by user, date, or category). Searches can target specific partitions, reducing the search scope and improving performance. Partitions are a lighter-weight alternative to creating multiple collections for data segmentation.
4 / 5
How does Milvus's HNSW index differ from IVF_FLAT in terms of search behavior?
HNSW builds a hierarchical navigable small-world graph and searches by traversing graph edges — it offers excellent recall with low latency but high memory usage. IVF_FLAT partitions vectors into clusters and searches only the nearest clusters — it uses less memory and supports faster build times but requires tuning nprobe for recall.
5 / 5
A developer uses Milvus's collection.search() with metric_type='COSINE'. What does cosine distance measure?
Cosine similarity measures the angle between two vectors in the vector space, ignoring magnitude. Two vectors pointing in the same direction have cosine similarity of 1 (distance 0) regardless of their lengths. This makes it ideal for text embeddings where the magnitude encodes text length but the direction encodes semantics.