Elasticsearch's k-NN capabilities enable semantic search alongside traditional BM25 retrieval. Understanding dense_vector mappings, ELSER sparse encoders, semantic_text fields, and hybrid scoring is essential for modern search engineers.
0 / 5 completed
1 / 5
An engineer defines a dense_vector field in an Elasticsearch mapping with "index": true, "similarity": "cosine". What does "index": true enable?
Setting "index": true on a dense_vector field instructs Elasticsearch to build an HNSW (Hierarchical Navigable Small World) index on the vectors. This enables efficient approximate k-NN queries. Without indexing, k-NN search falls back to exact brute-force computation, which is slow for large datasets.
2 / 5
What is ELSER (Elastic Learned Sparse Encoder) and how does it differ from dense vector search?
ELSER generates sparse, high-dimensional representations (like an expanded bag of semantically related terms with weights) rather than dense vectors. These are stored in sparse_vector fields and queried using text_expansion. Sparse representations are more interpretable and often more storage-efficient than dense 768-dim+ vectors.
3 / 5
An Elasticsearch query uses "semantic_text" field type in a mapping. What does this field type automatically handle?
The semantic_text field type (introduced in ES 8.11) is a high-level abstraction that automatically handles text chunking (splitting long documents), embedding generation via inference endpoints, and vector storage. At query time, using semantic queries on this field handles query embedding and k-NN search transparently.
4 / 5
What is hybrid scoring in Elasticsearch combining k-NN and BM25, and which query clause enables it?
Hybrid search combines dense vector (semantic) scores with BM25 (keyword) scores. Elasticsearch supports this via the retriever framework with rrf (Reciprocal Rank Fusion) combining rankings from multiple sub-queries, or via linear combination. This outperforms either method alone on most retrieval benchmarks.
5 / 5
A developer runs a k-NN query with "num_candidates": 100, "k": 10. What is the relationship between these parameters?
num_candidates is the per-shard HNSW search width — how many candidate vectors each shard considers during approximate nearest neighbor traversal. Higher values improve recall. k is the final number of results returned after merging candidates from all shards. num_candidates must be greater than or equal to k.