Practise the vocabulary for ETL, ELT, Change Data Capture (CDC), and incremental vs full load patterns.
0 / 6 completed
1 / 6
The key difference between ETL and ELT is:
ETL (legacy): transform → load using a dedicated transformation layer (Informatica, Talend). ELT (modern): load raw data into a cloud warehouse (Snowflake, BigQuery), then transform in-warehouse using SQL/dbt. ELT became feasible as cloud warehouses gained massive compute at low cost.
2 / 6
Change Data Capture (CDC) is used when:
CDC captures the delta (changes only), reducing data transfer volume and enabling near-real-time replication. Common methods: log-based CDC (reads database transaction logs — Debezium), trigger-based CDC, and timestamp-based incremental extraction.
3 / 6
An incremental load in a data pipeline means:
Incremental loads are more efficient than full loads for large tables: only the delta is processed. Requires a reliable changed-at timestamp or CDC. Challenge: deleted records are invisible to incremental loads — soft deletes or CDC are needed to handle them.
4 / 6
An upsert operation in a data warehouse refers to:
Upsert (MERGE/UPDATE-INSERT) is essential for incremental loads: avoid duplicates from new records while keeping existing records current. SQL: MERGE INTO target USING source ON target.id = source.id WHEN MATCHED THEN UPDATE ... WHEN NOT MATCHED THEN INSERT...
5 / 6
The phrase 'ELT became feasible as cloud warehouses gained compute power' means:
The ELT shift: cloud DWH pricing (pay-per-query, auto-scaling) made it cheaper to run transformations in the warehouse than to provision and maintain ETL transformation infrastructure. dbt democratised ELT by letting analysts write transformations in SQL.
6 / 6
Data lineage in a data engineering context means:
Data lineage answers: 'Where did this number come from?' If a revenue figure looks wrong, lineage lets you trace it back through transformations to the source tables and identify where the error was introduced. Tools: OpenLineage, Marquez, dbt lineage graph, modern data catalogs.