5 collocation exercises on data quality vocabulary.
0 / 5 completed
1 / 5
Before loading, the job should ___ the data to catch missing or malformed fields.
We validate data when we verify it meets expected rules, types, and constraints. Validate is the standard data-quality verb, appearing in tools like Great Expectations and in phrases such as "validation rules." Control and prove do not collocate with data this way, and check up is informal and used for health. Engineers say "validate inputs" and "data validation checks," so validate the data is the precise, professional collocation for ensuring correctness before processing continues.
2 / 5
Monitoring should ___ drift when the statistical distribution of incoming data changes over time.
To detect drift means to identify when data or model behaviour shifts away from a known baseline. Detect is the precise collocation used in ML monitoring and data observability, as in "drift detection." Feel, smell, and guess are intuitive verbs with no place in measurable, automated monitoring. Distribution changes are flagged by detectors, so detect drift is the correct phrasing whenever you describe systems that watch for changing data characteristics over time.
3 / 5
The ingestion layer will ___ a schema so that any record with the wrong shape is rejected.
We enforce a schema when we require all data to conform to a defined structure and reject anything that does not. Enforce is the exact collocation, used in "schema enforcement" features of formats like Delta Lake and Avro. Force, press, and push lack the sense of consistently applying a rule. The phrase "schema-on-write enforcement" is common in data platforms, confirming enforce a schema as the precise, idiomatic choice for guaranteeing structural consistency.
4 / 5
To remove repeated rows that arrived twice, the pipeline must ___ the records.
To deduplicate records means to remove duplicate entries so each item appears only once. Deduplicate (often shortened to "dedupe") is the standard data-engineering verb, as in "we dedupe on the event ID." Unique is an adjective, not a verb here, and singlify and declone are not real words. Pipelines routinely "deduplicate before loading," so deduplicate the records is the precise, expected collocation for eliminating repeated data.
5 / 5
A nightly job will ___ the totals between two systems to confirm the numbers match.
We reconcile data when we compare two datasets and confirm they agree, resolving any differences. Reconcile borrows from accounting and is the standard term in data quality, as in "reconciliation report." Reunite and rejoin imply bringing things together physically, and replay means reprocessing events, a different concept. Engineers say "reconcile source and target counts," so reconcile the totals is the precise collocation for verifying consistency across systems.