What are bucket aggregations in Elasticsearch and how does the terms aggregation work?
Terms aggregation:{ "aggs": { "genres": { "terms": { "field": "genre.keyword", "size": 10 } } } } returns the 10 most common genres with their document counts. size controls how many buckets are returned — higher values are more accurate but more memory-intensive. Use shard_size to improve accuracy in distributed clusters.
2 / 5
What is a date_histogram aggregation and what is it used for?
date_histogram:{ "date_histogram": { "field": "timestamp", "calendar_interval": "week", "time_zone": "Europe/London" } } groups documents into weekly buckets adjusted for the specified timezone. Sub-aggregations compute metrics per bucket: nesting a sum agg inside gives weekly revenue totals.
3 / 5
What are metric aggregations in Elasticsearch and which ones are available?
Metric aggregations:percentiles returns approximate P50/P95/P99 values (using TDigest algorithm). cardinality uses HyperLogLog to estimate unique value counts with configurable precision. extended_stats returns mean, variance, and standard deviation. These are typically nested inside bucket aggs to compute per-bucket metrics.
4 / 5
What are pipeline aggregations in Elasticsearch and how do they differ from bucket/metric aggs?
Pipeline aggregations: they take the output of a sibling or parent aggregation as input. cumulative_sum on a date_histogram of daily sales produces running totals. max_bucket finds which bucket had the highest metric value. They cannot be used standalone — they require a parent aggregation whose output they process.
5 / 5
What are nested aggregations in Elasticsearch and why are they needed for nested objects?
Nested aggregations: if comments is a nested field, { "nested": { "path": "comments" }, "aggs": { "avg_rating": { "avg": { "field": "comments.rating" } } } } correctly aggregates comment ratings. Without the nested agg wrapper, Elasticsearch would aggregate at the root document level, producing incorrect results for array-of-object fields.