An automated notification triggered when a metric crosses a threshold. Alerts route to on-call engineers via PagerDuty, OpsGenie, or Slack.

What is a "dashboard"?

A visual display of key metrics for a system. Monitoring dashboards show the current and historical state — e.g. Grafana, Datadog.

What is "cardinality" in observability?

The number of unique values a label can have. High-cardinality data (e.g. per-user metrics) is expensive in traditional metrics systems but valuable for debugging.

What is OpenTelemetry?

An open standard for instrumenting applications to produce logs, metrics, and traces in a vendor-neutral format. Increasingly the default choice for observability.

A documented procedure for responding to a specific alert or incident. Links alerts to actions: "when X fires, do Y".

What is "golden signals"?

Google SRE's four key metrics for any service: latency, traffic, errors, and saturation. A minimal but complete monitoring baseline.

What does "on-call" mean?

Being the designated engineer who responds to production alerts outside business hours. "I'm on-call this week" means you carry a pager (or phone) and respond to incidents.

DevOps · English usage comparison

Monitoring vs Observability: English Usage Guide for IT Professionals

Monitoring tells you when something is wrong (alerts on known failure modes); observability lets you understand why it is wrong (explore any question about system state from the outside). Monitoring is reactive; observability is exploratory. Both are necessary; observability is the higher-order concept.

Side-by-side comparison

Aspect	Monitoring	Observability
Question answered	"Is the system healthy?"	"Why is the system behaving this way?"
Data type	Metrics and alerts on known states	Logs, metrics, and traces (the three pillars)
Failure type	Catches known failure modes	Helps debug unknown/novel failures
Approach	Pre-define what to watch	Explore any question post-hoc

Example sentences

Monitoring

"Our monitoring alerts fire when error rate exceeds 1% or p99 latency exceeds 500 ms."
"We set up monitoring dashboards for CPU, memory, and request rate."

Observability

"Good observability means we can ask any question about system behaviour — even one we didn't think to monitor."
"We added distributed tracing to improve observability of our microservices."

Exercises: choose the correct English usage

Select the best answer for each question, then check your reasoning.

1. "An alert fired because error rate crossed 5%." This is an example of ___.

2. "We used distributed traces to understand why a rare request path was slow." This demonstrates ___.

3. What are the "three pillars of observability"?

4. "We didn't have ___ of this failure mode — we weren't monitoring for it." Which word?

5. Which is the higher-order concept?

Frequently asked questions

What are the three pillars of observability?

Logs (timestamped event records), metrics (numeric measurements over time), and traces (the path of a single request through distributed services).

What is a "trace" in observability?

A record of a single request's journey through a distributed system — showing each service it touched, how long each step took, and where errors occurred. Tools: Jaeger, Zipkin, Tempo.

What is an "SLO" and how does it relate to monitoring?

A Service Level Objective is the target (e.g. "99.9% of requests complete in under 200ms"). Monitoring alerts when SLOs are at risk of being breached.

Show more questions (7)