What English level do I need to read "Elasticsearch Vocabulary: 30 Terms for Search Engineers"?

This article is tagged Intermediate. If you find the vocabulary difficult, start with a related Vocabulary vocabulary exercise first, then come back — technical reading gets much easier once the core terms feel familiar.

Is this article free to read?

Yes. Every article on CoderSlingo, including this one, is free to read with no account, sign-up, or paywall.

How is reading this article different from doing an exercise?

Articles like this one explain concepts and vocabulary in context through prose, while exercises are interactive drills — fill-in-the-blank, matching, and multiple-choice — that test and reinforce specific terms. Reading builds understanding; exercises build recall.

Can I practice the vocabulary used in this article?

Yes — this article's topic lines up with our #vocabulary exercises. Use the "Practice this vocabulary" link below to jump straight into a matching drill.

How long does this article take to read?

About 9 min. Most CoderSlingo articles are written to be read in one sitting, without needing a dictionary open in another tab.

Do I need to create an account to read or save this article?

No account is required to read any article. If you complete exercises elsewhere on the site, your progress is saved locally in your browser — no login needed.

What if I don't understand a technical term used in the article?

Check the site Glossary for plain-English definitions of common IT terms — HTTP status codes, Git commands, design patterns, and more — or look up the related vocabulary module for this topic.

Can I share or link to this article?

Yes — use the Twitter/X or LinkedIn share buttons at the end of the article, or copy the page URL directly. Attribution back to CoderSlingo is appreciated but the content is free to reference.

How often is new content like this published?

New articles are added regularly across all categories, alongside new vocabulary sets and exercises. Tag pages (like this article's tags) are a good way to find related content as it's published.

Where can I find more articles like this one?

See the "Related Articles" section below for hand-picked follow-ups, or browse all Vocabulary articles from the main Blog index.

Elasticsearch Vocabulary: 30 Terms for Search Engineers

Elasticsearch powers search at scale — from e-commerce product catalogues to log analytics pipelines. If you work with it daily, you already know the commands. But when you join an English-speaking team, stand-up calls and code reviews introduce a layer of jargon that can slow you down. This guide covers the 30 terms you will hear most often, with plain-English definitions and real developer dialogue so you can use them confidently in conversation.

Core Terms

Index — In Elasticsearch, an index is a collection of related documents, roughly analogous to a database in a relational system. You store, search, and manage data at the index level.

“We’re splitting the logs index by month so the hot tier doesn’t fill up so fast.”

“Before you run that query, make sure you’re targeting the right index — there’s a legacy one still sitting there from last year.”

Document — A document is the basic unit of data in Elasticsearch, stored as JSON. Every document belongs to an index and has a unique _id.

“The document structure changed in the last sprint — the user_id field is now nested under metadata.”

“We’re indexing around two million documents a day, so mapping efficiency really matters.”

Shard — Elasticsearch divides an index into smaller pieces called shards. Each shard is a self-contained Lucene index. Sharding lets Elasticsearch distribute data across multiple nodes and parallelise queries.

“We over-sharded that index at the start — twenty shards for five gigabytes is overkill.”

“The query is slow because it’s hitting all twelve shards sequentially. We need to look at routing.”

Replica shard — A replica shard is an exact copy of a primary shard. Replicas serve two purposes: high availability (if a node goes down, replicas take over) and read throughput (search requests can be served from any copy).

“Bump the replica count to two before we go live — I don’t want a single-node failure taking down search.”

“Replica shards aren’t helping with indexing speed; they only help with reads and fault tolerance.”

Mapping — A mapping defines the schema for documents in an index: field names, data types (keyword, text, date, integer, etc.), and how each field should be analysed. Think of it as the typed schema that Elasticsearch uses to serialise and query your data.

“The mapping for that field is text, but we need exact matches — change it to keyword.”

“You can’t change an existing field’s mapping in place. You’ll need to reindex.”

Dynamic mapping vs explicit mapping — With dynamic mapping, Elasticsearch automatically detects and creates field types when you first index a document. With explicit mapping, you define the schema yourself before indexing. Dynamic mapping is convenient for prototyping but can create surprises in production; explicit mapping gives you precise control over field behaviour.

“We turned off dynamic mapping for this index. We had too many accidental float fields being created from string data.”

“Explicit mapping is a bit more upfront work, but it saves you from weird relevance scoring issues later.”

Indexing and Storage Internals

Inverted index — The inverted index is the core data structure that makes full-text search fast. Instead of storing documents and scanning them, Elasticsearch builds a lookup table from every unique term to the list of documents that contain it. When you search for “optimise”, Elasticsearch looks up that term in the inverted index and retrieves matching document IDs instantly.

“The reason keyword search is so fast is the inverted index — it’s not scanning row by row like a database would.”

“Stop words are removed during analysis so they don’t bloat the inverted index unnecessarily.”

_source field — The _source field stores the original JSON document as it was indexed. When you retrieve a document, Elasticsearch returns _source by default. You can disable or filter it to save storage, but doing so limits what you can fetch back without reindexing.

“We disabled _source on the metrics index to cut storage in half, but now we can’t use update-by-query.”

“You can use _source filtering in your request to return only the fields you actually need.”

Refresh interval — Elasticsearch writes new documents to an in-memory buffer and periodically makes them visible to search via a process called a refresh. The refresh interval (default: one second) controls how often this happens. For high-throughput ingestion, increasing the interval reduces overhead; for near-real-time search, keep it low.

“Set the refresh interval to thirty seconds during the bulk load, then drop it back to one second when you’re done.”

“The data appears in the index but isn’t searchable yet — it’s still waiting on the refresh.”

Index Lifecycle Management (ILM) — ILM is Elasticsearch’s built-in policy engine for managing indices over time. You define phases — hot, warm, cold, frozen, delete — and Elasticsearch automatically moves indices between them, shrinking shards, force-merging segments, or deleting old data according to your rules.

“We’ve got an ILM policy that rolls over the index when it hits fifty gigabytes or thirty days, whichever comes first.”

“Without ILM, someone has to manually clean up old indices. It’s the kind of thing that gets forgotten until a disk fills up at 3 a.m.”

Querying and Relevance

Relevance score (BM25) — When you run a full-text query, Elasticsearch ranks results by relevance score. The default scoring algorithm is BM25 (Best Match 25), which considers term frequency, inverse document frequency, and field length. Higher scores mean closer matches.

“The top results look wrong — BM25 is boosting short documents too heavily because of field length normalisation.”

“You can explain a query to see how BM25 calculated each document’s score. Really useful for debugging ranking issues.”

Query DSL — Elasticsearch’s Query DSL (Domain-Specific Language) is a JSON-based language for expressing queries. Instead of writing SQL, you compose nested JSON objects describing what you want to find and how to rank results.

“The Query DSL looks verbose at first, but once you understand the leaf and compound query pattern, it clicks.”

“Don’t build Query DSL strings by concatenation — use a client library that constructs the JSON properly.”

term query — A term query matches documents that contain an exact, unanalysed value in a field. Use it for keyword fields, IDs, statuses, and other structured data where you need a precise match.

“Use a term query for the status field, not match — you want exact string equality, not full-text analysis.”

match query — A match query runs the search string through the field’s analyser before comparing. This makes it suitable for full-text fields where you want tokenisation, stemming, and stop-word removal to apply.

“The match query is forgiving — it analyses the input the same way the field was indexed, so ‘running’ matches ‘run’.”

bool query — A bool query lets you combine multiple queries using logical clauses: must (required, affects score), should (optional, boosts score), must_not (exclusion, no score contribution), and filter (required, no score contribution).

“Wrap your filters in the filter clause of a bool query — they’re cached and don’t affect scoring, so performance is much better.”

“The should clause is what gives you that ‘nice to have’ boosting without hard-requiring the term.”

range query — A range query matches documents where a field’s value falls between specified bounds. Common with dates, prices, and numeric metrics.

“Use a range query on the created_at field to scope results to the last thirty days.”

Analysis and Text Processing

Analyser — An analyser is the pipeline Elasticsearch uses to process text when indexing and searching. It typically consists of a character filter (optional), a tokenizer, and one or more token filters. The choice of analyser determines what ends up in the inverted index.

“The default standard analyser lowercases and tokenises on whitespace and punctuation. It’s fine for most English content.”

“We’re using the english analyser for the blog content — it handles stemming and stop words automatically.”

Tokenizer — The tokenizer is the component within an analyser that breaks a string into individual tokens (terms). The standard tokenizer splits on whitespace and punctuation; the whitespace tokenizer splits only on spaces; ngram tokenizers produce character-level fragments for partial matching.

“We switched to an ngram tokenizer for the search-as-you-type feature so partial strings match correctly.”

“The tokenizer is just one part of the analysis chain — filters run afterwards to modify the tokens.”

Filter (token filter) — A token filter modifies, removes, or adds tokens after the tokenizer has run. Common filters include lowercase, stop (removes stop words), stemmer (reduces words to their root form), and synonym.

“Add a synonym filter so that ‘k8s’ matches ‘kubernetes’ in the index.”

“The stop filter is removing ‘not’ from the query, which is flipping the meaning entirely. Turn it off for this field.”

Aggregations

Aggregation — Aggregations let you compute analytics over your search results. Instead of just retrieving matching documents, you can count them, sum values, find averages, build histograms, and more. They are defined in the aggs section of a query.

“We use aggregations to power the faceted navigation on the product listing page — category counts, price ranges, brand filters.”

“If you only need the aggregation result and not the raw hits, set size: 0 to avoid fetching documents unnecessarily.”

Bucket aggregation — A bucket aggregation groups documents into buckets based on a criterion — a field value, a date range, a geographic boundary. Each bucket can contain further sub-aggregations.

“The terms aggregation is a bucket aggregation — it creates one bucket per unique value of the field.”

“Nest a metric aggregation inside each bucket to get, say, average order value per country.”

Metric aggregation — A metric aggregation computes a single numeric value (or a set of values) from the documents in a bucket: sum, avg, min, max, cardinality, percentiles, and so on.

“The cardinality aggregation gives you an approximate distinct count. It’s not exact, but it’s fast enough for dashboards.”

How to Use These in Conversation

Scenario 1 — Explaining a performance issue in a stand-up: “The dashboard queries are slow because we’re running a terms aggregation across the full index with no filter. I’m going to add a date range filter in the bool query’s filter clause so Elasticsearch can use the ILM-managed warm-tier index instead of scanning everything.”

Scenario 2 — Reviewing a mapping change in a pull request: “This field should be keyword, not text. We’re only ever doing exact matches on it — using text means Elasticsearch will analyse it and create unnecessary entries in the inverted index. Also, dynamic mapping is still on for this index; let’s define an explicit mapping to prevent surprises.”

Scenario 3 — Debugging relevance ranking with a colleague: “The BM25 scores look off for short documents. Can you run the _explain API on one of these results so we can see how the relevance score was calculated? I suspect the field length normalisation is over-penalising longer, more complete records.”

Scenario 4 — Discussing ILM during a capacity planning call: “Right now we’re keeping everything on the hot tier indefinitely, which is why storage costs are climbing. If we set up an ILM policy to move indices older than fourteen days to warm and delete after ninety, we can cut the SSD footprint by about sixty per cent.”

Quick Reference

Term	What it means
Index	A named collection of documents; the top-level data container
Shard	A horizontal slice of an index; enables distribution and parallelism
Replica shard	A copy of a primary shard for redundancy and read throughput
Mapping	The schema defining field names, types, and analysis settings
Inverted index	The internal lookup structure mapping terms to document IDs
BM25	The default relevance scoring algorithm for full-text queries
bool query	A compound query combining `must`, `should`, `filter`, `must_not` clauses
Analyser	The pipeline (tokenizer + filters) that processes text for indexing and search
Bucket aggregation	Groups documents into named buckets (e.g. by category, date range)
ILM	Policy engine that automates index lifecycle phases (hot → warm → delete)

Mastering this vocabulary will not only help you communicate more precisely in English — it will sharpen how you think about Elasticsearch itself. Each term reflects a design decision, and understanding the language is the first step to reasoning clearly about search architecture.

Elasticsearch Vocabulary: 30 Terms for Search Engineers

Core Terms

Indexing and Storage Internals

Querying and Relevance

Analysis and Text Processing

Aggregations

How to Use These in Conversation

Quick Reference

What to Read Next

Practice This Vocabulary

IT Collocations Drills

Interview Preparation

IT Vocabulary Modules

Frequently Asked Questions