Vocabulary for Observability Engineers: Logs, Metrics, and Traces

Observability is how engineers understand what a system is doing from the outside — by collecting logs, metrics, and traces. The field has a dense, specific vocabulary, and many terms (cardinality, percentile, span) trip up non-native speakers. This guide explains the essential words in context so you can read dashboards, write alerts, and discuss incidents fluently.

The three pillars

Observability rests on three data types. Know exactly what each is and how it’s used.

Pillar	What it is	A sentence
Logs	Timestamped text records of events	”Grep the logs for the error.”
Metrics	Numeric measurements over time	”CPU is a gauge metric.”
Traces	The path of a request across services	”The trace shows where the latency is.”

“When a request is slow, metrics tell you that it’s slow, traces tell you where, and logs tell you why.”

That sentence is the single best way to remember the difference — and a great thing to say in an interview.

Metric vocabulary

Metrics have specific types and you must use the right word.

Term	Meaning
Counter	A number that only goes up (e.g. total requests)
Gauge	A number that goes up and down (e.g. memory in use)
Histogram	A distribution of values (e.g. latency buckets)
Rate	How fast a counter increases
Cardinality	The number of unique label combinations

“Request count is a counter, so we look at its rate, not its raw value. Memory is a gauge — we read it directly.”

Cardinality is the term that confuses people most. High cardinality means too many unique combinations (e.g. tagging metrics by user ID), which explodes storage cost.

“Don’t put user_id in a metric label — it’s a cardinality explosion. Use it in traces instead.”

Percentiles: the most misused word

Latency is described with percentiles, not averages. You must say them correctly.

Notation	Said as	Meaning
p50	”p fifty” / “the median”	Half of requests are faster
p95	”p ninety-five”	95% are faster, 5% slower
p99	”p ninety-nine”	99% are faster
p99.9	”p three nines”	The slowest 0.1%

“The average latency looks fine, but the p99 is terrible — our slowest 1% of users are suffering. Averages hide tail latency.”

Tail latency (the slow end of the distribution) is critical vocabulary. The “tail” is the long thin part of the curve.

“We’re chasing the long tail — a small number of very slow requests dragging the p99 up.”

Logging vocabulary

Term	Meaning
Log level	Severity: DEBUG, INFO, WARN, ERROR
Structured logging	Logs as key-value/JSON, not free text
Log line	A single log entry
Verbose	Producing a lot of log output
Correlation ID	An ID linking logs from one request
Sampling	Keeping only a fraction of logs/traces

“Use structured logging with a correlation ID so you can stitch together every log line for a single request across services.”

The verb stitch together (combine related pieces) is natural and useful here.

Tracing vocabulary

Term	Meaning
Span	One unit of work in a trace
Parent / child span	Nesting of operations
Trace ID	The ID for the whole request journey
Instrumentation	Adding tracing code to your service
Distributed trace	A trace spanning multiple services

“Each service adds a span to the distributed trace. The waterfall view shows that the database span is eating 80% of the time.”

Instrument is both a noun-derived verb here: “we need to instrument the payment service” means add observability code to it.

Alerting vocabulary

Term	Meaning
Threshold	The value that triggers an alert
Fire	When an alert triggers
Flapping	Alert toggling on and off rapidly
Noisy	Alerts that fire too often without value
Alert fatigue	Becoming numb to too many alerts
Actionable	An alert a human can actually do something about

“This alert is flapping and noisy — it’s causing alert fatigue. If it’s not actionable, let’s delete it. Every alert should require a human action.”

The verb is fire: “the alert fired at 2 a.m.” — not “the alert was activated.”

Phrases for incident discussions

“Let’s pull up the dashboard and look at the p99.”
“I’ll dig into the traces to find the slow span.”
“The metrics are flat — no change — but the logs show errors.”
“We’re flying blind here; this service isn’t instrumented.”
“Let’s drill down from the service level to the endpoint.”

“Error rate spiked, p99 shot up, and the traces point to a single slow downstream call. The logs confirm a timeout. Classic.”

Flying blind (operating without visibility) is excellent vocabulary for an uninstrumented system.

Common mistakes

Saying “average” when you mean “p99.” Averages hide the worst cases. SREs almost always care about percentiles.
Confusing “counter” and “gauge.” A counter only increases; a gauge moves both ways. Using the wrong one breaks your math.
Mispronouncing “cardinality.” It’s /ˌkɑːrdɪˈnæləti/ — “car-di-NAL-i-ty.”
Saying “logs says.” Logs is plural: “the logs say,” “a log line shows.”
Using “metric” for everything. A log is not a metric. Keep the three pillars distinct.

Quick reference glossary

SLO / SLI — reliability targets and the metrics behind them
Golden signals — latency, traffic, errors, saturation
Saturation — how full a resource is
Aggregation — combining many data points (sum, avg, max)
Retention — how long data is kept
Dashboard — a visual panel of metrics
Heatmap — a visualisation of distribution over time

“Monitor the four golden signals — latency, traffic, errors, saturation — and you’ll catch most problems before users do.”

Key takeaways

Metrics = that it’s slow, traces = where, logs = why.
Talk in percentiles (p95, p99), not averages — averages hide tail latency.
Mind cardinality: high-cardinality labels explode cost; put unique IDs in traces.
Alerts should be actionable; noisy, flapping alerts cause alert fatigue.

Master this vocabulary and dashboards stop being intimidating walls of numbers — they become a language you read fluently, even at 3 a.m.

Vocabulary for Observability Engineers: Logs, Metrics, and Traces

The three pillars

Metric vocabulary

Percentiles: the most misused word

Logging vocabulary

Tracing vocabulary

Alerting vocabulary

Phrases for incident discussions

Common mistakes

Quick reference glossary

Key takeaways

What to Read Next

Practice This Vocabulary

IT Collocations Drills

Interview Preparation

IT Vocabulary Modules

The three pillars

Metric vocabulary

Percentiles: the most misused word

Logging vocabulary

Tracing vocabulary

Alerting vocabulary

Phrases for incident discussions

Common mistakes

Quick reference glossary

Key takeaways

Related Articles

What to Read Next

Practice This Vocabulary

IT Collocations Drills

Interview Preparation

IT Vocabulary Modules