Practise vocabulary for data observability (Monte Carlo, Acceldata), data downtime, freshness, volume, distribution, schema change detection, data incidents, and data health dashboards.
0 / 5 completed
1 / 5
Data observability is best described as:
Data observability (coined by Barr Moses, Monte Carlo) is the data equivalent of software observability. Just as engineers use logs, metrics, and traces to understand system health, data teams use observability to understand data health. The five pillars: freshness, volume, distribution, schema, lineage.
2 / 5
"Data downtime" in observability terminology means:
Monte Carlo popularised "data downtime" as a metric: how many hours per month is data in a bad state? Like measuring system availability as 99.9% uptime, teams can track data reliability. High data downtime = analysts are frequently querying data they cannot trust, making decisions on bad information.
3 / 5
Schema change detection in data observability monitors:
Schema changes are a leading cause of data incidents: a source team renames a column from "user_id" to "account_id" without warning; downstream joins silently return nulls; reports show $0 revenue for a week before anyone notices. Observability platforms detect schema diffs and alert immediately, before downstream breakage.
4 / 5
In data observability, "distribution monitoring" tracks:
Distribution monitoring example: "the revenue column historically has values between $10 and $50,000; today 2% of values are above $1,000,000." This could indicate a currency unit error (cents vs dollars), a fraud pattern, or a pipeline bug. Static rules cannot catch this — ML-based distribution monitoring learns expected ranges.
5 / 5
A "data incident" in a data observability context is:
Data incidents follow the same lifecycle as production incidents: detection → triage → root cause analysis → resolution → post-mortem. "The orders table had null revenue for 4 hours due to a CDC connector restart — impact: 3 dashboards showed incorrect data, 1 automated alert fired incorrectly." Teams track MTTR (Mean Time To Resolution) for data incidents.