Practice the vocabulary for reading and communicating about monitoring dashboards: metrics, graphs, anomaly detection, and alert thresholds.
0 / 8 completed
1 / 8
What does a 'spike' on a monitoring graph typically indicate?
Spikes are short-lived anomalies. They may be normal (cron job running, batch import) or concerning (error rate spike, memory pressure). Context determines significance — a spike on weekdays at 9am is likely expected traffic.
2 / 8
What does 'P99 latency' mean on a dashboard?
P99 shows worst-case latency for most users. Average latency can hide tail performance problems — a P99 of 2 seconds means 1 in 100 requests is very slow. P99 and P95 are the key dashboard metrics for user experience monitoring.
3 / 8
What does 'error rate' on a dashboard represent?
Error rate = errors / total requests * 100%. An error rate of 0.5% might be acceptable; a sudden jump to 5% is a significant incident signal. Absolute error counts are less useful than rates because they depend on traffic volume.
4 / 8
What is a 'saturation' metric in the RED method?
The RED method: Rate, Errors, Duration. The USE method adds Utilization, Saturation, Errors. Saturation metrics predict problems: 95% CPU saturation means the system is near its limit — errors and latency spikes follow if load increases.
5 / 8
What does 'flapping alert' mean in a monitoring context?
Flapping alerts fire and resolve repeatedly, creating alert fatigue. They indicate the threshold is set too close to the normal operating range. Solutions: raise the threshold, use hysteresis (require the metric to stay above threshold for N minutes), or fix the underlying oscillation.
6 / 8
How would you describe a 'heatmap' visualization to a colleague?
Heatmaps (common in Grafana) show latency distributions over time. A bimodal distribution (two color bands) might reveal that some requests are served from cache (fast) and others from the database (slow) — invisible in a simple P99 graph.
7 / 8
What does 'cardinality' mean in the context of monitoring metrics?
High-cardinality labels (user_id, request_id) create millions of unique metric series and can crash Prometheus or dramatically increase costs. Design metrics with controlled cardinality: use status_code, endpoint, region — not unique IDs.
8 / 8
What does 'baseline' mean in anomaly detection on dashboards?
Baselines capture 'normal' — which may vary by time of day, day of week, or season. Dynamic anomaly detection flags when current values deviate significantly from the baseline pattern, rather than fixed thresholds.