Practise vocabulary for data quality dimensions (completeness, accuracy, timeliness, consistency, uniqueness), data quality rules, expectations, data SLA, and anomaly detection.
0 / 5 completed
1 / 5
The data quality dimension "completeness" measures:
Completeness example: "email is null for 12% of user records — expected 0% null rate." Or: "pipeline delivered 8,200 rows today; expected 10,000 ± 5%. Missing 18% of expected volume." Completeness checks are the most common first line of data quality monitoring.
2 / 5
The data quality dimension "timeliness" refers to:
Timeliness is distinct from freshness: fresh data arrived recently; timely data arrived before the deadline. Example: "sales reports must reflect all transactions by 08:00 UTC for the morning leadership review." A pipeline completing at 09:00 with accurate data still fails its timeliness SLA.
3 / 5
In Great Expectations terminology, an "expectation" is:
Great Expectations (GX) provides a Python framework for writing data expectations as code. Expectations are run as part of pipelines and produce Validation Results: pass/fail per expectation. A Data Doc summarises results for stakeholders. This approach makes data quality testable and version-controlled.
4 / 5
A "data SLA" (Service Level Agreement) in a data context defines:
Data SLAs make implicit quality expectations explicit and measurable. They enable SLO tracking (Service Level Objectives) and incident alerting: "orders SLA breached — data is 2 hours late". Without data SLAs, consumers discover failures only when reports are wrong, long after the pipeline missed its window.
5 / 5
Data anomaly detection in the context of data quality lineage means:
ML-based anomaly detection (Monte Carlo, Acceldata, Anomalo) learns baseline distributions for metrics like row count, null rate, and column value ranges — then alerts when actual values deviate significantly. This catches quality regressions that static threshold rules miss, particularly for metrics with seasonal patterns.