Data quality is a shared responsibility across engineering, analytics, and product teams. Discussions about completeness checks, validation rules, and data contracts require precise vocabulary. This exercise covers the natural collocations data engineers and analysts use in quality reviews and pipeline documentation.
0 / 5 completed
1 / 5
The data engineering team needs to ___ completeness checks across all pipeline outputs.
Run completeness checks is the natural data engineering collocation — data quality checks are 'run' like tests or jobs. 'Perform' is more formal; 'execute' is a technical synonym but sounds overly formal in conversation; 'do' is too informal.
2 / 5
The analyst flagged that some records ___ the validation rules for the date field.
Fail the validation rules is the standard data quality collocation — records 'fail' validation, just as tests fail. 'Violate' is more formal and used in constraint terminology; 'break' is informal; 'miss' implies absence rather than non-compliance.
3 / 5
We should ___ a data quality score for each source system in the monthly report.
Calculate a data quality score is the correct collocation — scores in data quality are derived from measurements, not arbitrarily assigned. 'Assign' implies manual attribution; 'give' is informal; 'set' implies a target rather than a measured outcome.
4 / 5
The team uses Great Expectations to ___ data contracts between services.
Enforce data contracts is the precise collocation in modern data engineering — enforcement means automatically blocking or alerting when expectations are violated. 'Validate' and 'check' are also used but imply verification without action; 'test' is the underlying mechanism.
5 / 5
Before promoting data to the gold layer, pipelines must ___ deduplication on all incoming records.
Apply deduplication is the standard data pipeline collocation — deduplication logic is 'applied' as a transformation step. 'Run deduplication' is also common; 'perform' is overly formal for a pipeline step; 'execute' sounds too imperative for a data transformation.