Delta Lake Change Data Feed: English for Data Lakehouse CDC Discussions
Learn the English vocabulary data engineers use when discussing Delta Lake Change Data Feed, CDC pipelines, and incremental processing in lakehouse architectures.
Change Data Capture is one of the most discussed patterns in modern data engineering, and Delta Lake’s Change Data Feed feature has made CDC conversations even more specific. If you work on a data platform team and need to discuss incremental pipelines, downstream consumers, and lakehouse architecture in English, this vocabulary guide is for you.
Core Vocabulary
Change Data Feed (CDF) A Delta Lake feature that records row-level changes made to a Delta table — inserts, updates, and deletes — so that downstream consumers can process only what changed rather than re-reading the entire table.
“Now that we’ve enabled Change Data Feed on the transactions table, the downstream aggregation job runs in under a minute instead of forty.”
Change type
The classification of a row-level change in the CDF output. Delta Lake produces four change type values: insert, update_preimage (the row before an update), update_postimage (the row after an update), and delete.
“When we filter by change type, we only process update_postimage and insert records — we don’t need the preimage for this pipeline.”
_commit_version A system column added to CDF output that identifies which Delta table version the change belongs to. It is used to track exactly how far a consumer has processed.
“We store the last processed _commit_version in our metadata table so the pipeline can resume from exactly where it left off after a failure.”
_commit_timestamp A system column in the CDF output that records when the commit occurred. It allows consumers to filter changes by time window rather than by version number.
“For the daily reporting job, we filter by _commit_timestamp to pull only changes that happened between midnight and now.”
Incremental processing A data pipeline pattern where only new or changed records are read and transformed, as opposed to full table scans on every run. CDF is a primary enabler of incremental processing in Delta Lake.
“Switching to incremental processing cut our pipeline’s compute cost by 70% — we no longer re-process records that haven’t changed.”
Propagation The movement of changes from a source table through a series of downstream tables or systems. In CDC contexts, propagation describes how a change in the bronze layer eventually reaches gold.
“The tricky part of our architecture is change propagation through the medallion layers — a delete in bronze has to propagate correctly through silver aggregations.”
Downstream consumer Any pipeline, job, or system that reads from a Delta table to derive its own output. When you enable CDF, you are changing the interface your downstream consumers depend on.
“Before we alter the schema, we need to audit all downstream consumers of this table — there are at least six pipelines that read from it.”
Medallion architecture in CDC context The bronze-silver-gold layered architecture applied to incremental data flows. CDF enables changes to propagate through each medallion layer incrementally rather than triggering full refreshes.
“In our medallion architecture, CDF lets us keep the silver layer up to date within minutes of bronze receiving new events.”
Key Collocations
- enable change data feed — “We need to enable change data feed on the table before we can start reading incremental changes — it’s not retroactive.”
- read CDF changes — “The Spark job reads CDF changes using the startingVersion parameter to pick up from the last checkpoint.”
- propagate changes downstream — “One challenge is ensuring that deletes propagate downstream correctly when aggregations have already been computed.”
- filter by change type — “We filter by change type to separate inserts from updates, because the business logic differs for each case.”
- track commit history — “We track commit history using _commit_version to give us an audit trail for compliance reporting.”
- consume incrementally — “Each downstream job is designed to consume incrementally — it reads only the changes since its last successful run.”
Using This Vocabulary in Architecture Discussions
When talking through a CDF-based pipeline design, English speakers often use the phrase “as of version X” or “since version X” to describe temporal boundaries. For example: “The silver job picks up all CDF changes since version 142, which was the last successful run.”
The distinction between preimage and postimage is important to express clearly. A common way to phrase this is: “We need the preimage to calculate what the value was before the update, not just what it became.” If you only say “the old value,” engineers may understand you, but using preimage and postimage signals professional familiarity with CDC vocabulary.
When discussing propagation failures, the phrase “changes didn’t flow through” is natural in spoken conversation, even though the written documentation uses “propagation failure.” Being able to switch between formal and informal phrasing helps you communicate in both code reviews and quick Slack discussions.
Common Mistakes to Avoid
A common error is saying “CDC table” when you mean “a Delta table with CDF enabled.” CDC is the pattern; CDF is the Delta feature. Keep them distinct: “We’re using CDF to implement a CDC pattern on this table.”
Another frequent mistake is treating _commit_version as a timestamp. Versions are sequential integers tied to Delta commits, not wall-clock time. Use _commit_timestamp when you need time-based filtering, and _commit_version when you need positional tracking. Mixing these up in conversation leads to confused pipeline behaviour discussions.
Practice Tip
Find a Delta Lake CDF tutorial and describe the pipeline steps to a colleague or to yourself out loud. Focus on using “enable change data feed,” “filter by change type,” and “consume incrementally” in full sentences. Try to explain the difference between update_preimage and update_postimage without looking at your notes — this is the concept that most often causes confusion in real team discussions.