Data engineers speak a specific dialect of technical English. Collocations like ingest data, orchestrate pipeline, and backfill historical data are the vocabulary of ETL and data platform work. These exercises will help you use them accurately in job interviews, code reviews, and design documents.
0 / 5 completed
1 / 5
The Kafka connector is configured to ___ from the upstream API every five minutes.
Ingest data is the standard data engineering collocation for bringing data into a pipeline or storage system. 'Collect' and 'fetch' are used in broader contexts but 'ingest' is the professional term in data platform and ETL documentation.
2 / 5
The dbt models ___ before loading them into the analytical layer.
Transform records is the correct data pipeline collocation. Data transformation — reshaping, cleaning, and enriching records — uses 'transform' as the standard verb. 'Convert' implies format change only; 'change' and 'modify' are too generic.
3 / 5
The final step of the ETL process is to ___ for analysis.
Load into warehouse is the canonical ETL collocation — it is literally the 'L' in ETL. Data engineers load transformed data into a data warehouse. 'Put', 'move', and 'send' are not idiomatic in data engineering documentation.
4 / 5
We use Apache Airflow to ___ and manage dependencies between tasks.
Orchestrate pipeline is the professional term for coordinating the sequence and dependencies of data pipeline tasks. 'Orchestrate' implies scheduling, dependency management, and monitoring — core functions of tools like Airflow and Prefect.
5 / 5
After fixing the ingestion bug, the team needed to ___ from the past 90 days.
Backfill historical data is the standard data engineering collocation for re-processing past data to fill gaps or correct errors. 'Backfill' is a specific technical term in this domain; 'reload' and 'restore' imply different operations.