The first stage of the ETL job will ___ raw events from the Kafka topic into the warehouse.
To ingest data means to bring raw data into a system as the first step of a pipeline. Ingest is the standard data-engineering term, as in "data ingestion" and "ingestion layer." Eat up, absorb in, and pull across are informal or not real collocations. Tools like Fivetran and Kafka describe themselves as "data ingestion" platforms, so ingest events is the correct collocation for the initial data acquisition step of any pipeline.
2 / 5
The transformation layer will ___ the raw logs, stripping sensitive fields and normalising timestamps.
To process data means to apply operations — filtering, parsing, enriching — to transform it into a usable form. Process is the standard pipeline verb, as in "data processing" and "stream processing." Handle through, run over, and work on are informal and imprecise. Data engineers say "process the raw logs before loading," so process data is the correct collocation for the intermediate stage of applying operations to data within a pipeline.
3 / 5
A dbt model will ___ the source data into a clean, analytics-ready format.
To transform data means to reshape, clean, and restructure it into the form required downstream. Transform is the T in ETL and the central verb of dbt (data build tool). Change over, flip around, and convert out are informal or not technical. The entire "transform" step in ETL is named for this verb, so transform data is the correct collocation for applying structural and content changes to data in a pipeline.
4 / 5
The quality step will ___ that each record contains the required fields before it is loaded.
To validate data means to verify it meets schema, type, and business-rule constraints. Validate is the standard data-quality verb, as in "data validation rules" and tools like Great Expectations. Double-check up, prove out, and confirm across are not real collocations. Engineers write "validate data before loading to prevent bad records downstream," so validate data is the correct collocation for the quality-assurance step in a data pipeline.
5 / 5
The final step will ___ the cleansed, validated data into the BigQuery table.
To load data means to write the processed output into its destination storage. Load is the L in ETL/ELT and the standard pipeline term for this final step. Put in, drop into, and send through are informal and not the standard phrase. Engineers say "load the data into the warehouse," so load data is the correct collocation for writing the final, clean data into its target database or data lake.