DevOps · English usage comparison
Monitoring vs Observability: English Usage Guide for IT Professionals
Monitoring tells you when something is wrong (alerts on known failure modes); observability lets you understand why it is wrong (explore any question about system state from the outside). Monitoring is reactive; observability is exploratory. Both are necessary; observability is the higher-order concept.
Side-by-side comparison
| Aspect | Monitoring | Observability |
|---|---|---|
| Question answered | "Is the system healthy?" | "Why is the system behaving this way?" |
| Data type | Metrics and alerts on known states | Logs, metrics, and traces (the three pillars) |
| Failure type | Catches known failure modes | Helps debug unknown/novel failures |
| Approach | Pre-define what to watch | Explore any question post-hoc |
Example sentences
Monitoring
- "Our monitoring alerts fire when error rate exceeds 1% or p99 latency exceeds 500 ms."
- "We set up monitoring dashboards for CPU, memory, and request rate."
Observability
- "Good observability means we can ask any question about system behaviour — even one we didn't think to monitor."
- "We added distributed tracing to improve observability of our microservices."
Exercises: choose the correct English usage
Select the best answer for each question, then check your reasoning.
1. "An alert fired because error rate crossed 5%." This is an example of ___.
Explanation: Alerting on known thresholds is monitoring.
2. "We used distributed traces to understand why a rare request path was slow." This demonstrates ___.
Explanation: Exploring unknown behaviour through traces is observability.
3. What are the "three pillars of observability"?
Explanation: The three pillars of observability are logs (events), metrics (measurements over time), and traces (request flows).
4. "We didn't have ___ of this failure mode — we weren't monitoring for it." Which word?
Explanation: "Observability" captures the ability to understand system state — if you couldn't see it, you lacked observability.
5. Which is the higher-order concept?
Explanation: Observability is the broader capability — good observability enables better monitoring. Monitoring is a specific practice within observability.
Frequently asked questions
What are the three pillars of observability?
Logs (timestamped event records), metrics (numeric measurements over time), and traces (the path of a single request through distributed services).
What is a "trace" in observability?
A record of a single request's journey through a distributed system — showing each service it touched, how long each step took, and where errors occurred. Tools: Jaeger, Zipkin, Tempo.
What is an "SLO" and how does it relate to monitoring?
A Service Level Objective is the target (e.g. "99.9% of requests complete in under 200ms"). Monitoring alerts when SLOs are at risk of being breached.
What is an "alert"?
An automated notification triggered when a metric crosses a threshold. Alerts route to on-call engineers via PagerDuty, OpsGenie, or Slack.
What is a "dashboard"?
A visual display of key metrics for a system. Monitoring dashboards show the current and historical state — e.g. Grafana, Datadog.
What is "cardinality" in observability?
The number of unique values a label can have. High-cardinality data (e.g. per-user metrics) is expensive in traditional metrics systems but valuable for debugging.
What is OpenTelemetry?
An open standard for instrumenting applications to produce logs, metrics, and traces in a vendor-neutral format. Increasingly the default choice for observability.
What is a "runbook"?
A documented procedure for responding to a specific alert or incident. Links alerts to actions: "when X fires, do Y".
What is "golden signals"?
Google SRE's four key metrics for any service: latency, traffic, errors, and saturation. A minimal but complete monitoring baseline.
What does "on-call" mean?
Being the designated engineer who responds to production alerts outside business hours. "I'm on-call this week" means you carry a pager (or phone) and respond to incidents.