Delta Lake's Change Data Feed captures row-level changes for incremental CDC pipelines. Master enabling CDF, _change_type semantics (insert/update_preimage/update_postimage/delete), readChangeFeed queries, and DML history tracking for efficient streaming architectures.
0 / 5 completed
1 / 5
A Delta Lake table is created with TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true'). What does enabling Change Data Feed (CDF) allow downstream consumers to do?
Enabling Change Data Feed causes Delta Lake to record row-level changes (inserts, updates, deletes) as they occur. Downstream consumers can then read only the changes since a given version or timestamp using readChangeFeed, instead of reprocessing the entire table — enabling efficient CDC pipelines.
2 / 5
When reading a Delta Lake CDF stream, rows include a _change_type column. A row has _change_type = 'update_preimage'. What does this value indicate?
Delta Lake CDF emits two rows for each UPDATE: update_preimage (the row's state before the update) and update_postimage (the state after). This pair enables consumers to compute the exact change — e.g., auditing which field changed from what value to what value.
3 / 5
A Spark job reads Delta CDF changes with spark.read.format("delta").option("readChangeFeed", "true").option("startingVersion", 5).load(path). What does startingVersion specify?
startingVersion tells the CDF reader to include changes from that Delta table version onward. Delta Lake stores version history (transaction log), so the reader replays commits from version 5 to the latest, returning only the rows that changed during that range — enabling incremental processing from any historical checkpoint.
4 / 5
A team builds a CDC pipeline using Delta Lake CDF. A MERGE operation runs on the source table. How does CDF represent rows affected by a MERGE with matched and not-matched clauses?
Delta Lake CDF captures the logical effect of each MERGE clause on individual rows: rows matched by the UPDATE clause appear as update_preimage/update_postimage pairs; rows matched by DELETE appear as delete; rows processed by the INSERT not-matched clause appear as insert. The MERGE operation itself is transparent to CDF consumers.
5 / 5
Delta Lake's DML history tracking stores commit metadata in the _delta_log/ directory. What information does each commit JSON file contain?
Each commit in _delta_log/ is a JSON file containing transaction metadata: the DML operation type (WRITE, MERGE, DELETE, UPDATE), add/remove file actions with column statistics, commit info (timestamp, user, SQL), and CDF-related metadata. Actual row data stays in Parquet files; the log tracks which files changed and why.