5 exercises on Elasticsearch indexing, querying, and relevance.
0 / 5 completed
1 / 5
What is an inverted index in Elasticsearch?
An inverted index is the core data structure behind full-text search. Instead of mapping documents to their words, it maps each term to a posting list of the documents (and positions) that contain it. So a search for a word jumps straight to the list of matching documents rather than scanning everything. Built on Apache Lucene, Elasticsearch creates an inverted index per field at index time, after analysis. This is what makes querying enormous text corpora near-instantaneous.
2 / 5
What does an analyzer do in Elasticsearch?
An analyzer transforms raw text into the terms stored in the inverted index. It runs a pipeline: optional character filters, a tokenizer that splits text into tokens (typically on word boundaries), and token filters that normalize them — lowercasing, removing stop words, applying stemming so "running" matches "run". The same analyzer is applied to both indexed text and query strings so they align. Choosing or customizing analyzers (per language, for example) is central to getting relevant search results.
3 / 5
How do a shard and a replica relate in Elasticsearch?
An Elasticsearch index is divided into primary shards, each a self-contained Lucene index, so data and query load spread across nodes for horizontal scale. Each primary can have replica shards — exact copies on other nodes — providing high availability (a replica is promoted if a node fails) and extra read throughput, since searches can hit replicas too. The primary shard count is fixed at index creation, while replica counts can change. Balancing shard size and number is a key tuning decision.
4 / 5
What is the Query DSL in Elasticsearch?
The Query DSL (Domain Specific Language) is Elasticsearch's JSON syntax for building searches. It distinguishes the query context, where clauses contribute to a relevance score (e.g. match, multi_match), from the filter context, where clauses just include or exclude documents (e.g. term, range) and are cacheable for speed. Compound queries like bool combine must, should, filter, and must_not clauses. This expressive structure lets you compose precise, performant searches.
5 / 5
What is relevance scoring in Elasticsearch?
Relevance scoring ranks matching documents by how well they satisfy a query, assigning each a _score. Modern Elasticsearch uses the BM25 algorithm, which weighs term frequency (more occurrences raise the score, with diminishing returns), inverse document frequency (rare terms count for more), and field length normalization (matches in shorter fields rank higher). You can tune scoring with boosts, function scores, and field weighting. Filter-context clauses, by contrast, do not affect the score — they only narrow the candidate set.