Practise vocabulary for impact analysis in data lineage: downstream dependencies, breaking change propagation, and communicating schema changes to pipeline stakeholders.
0 / 5 completed
1 / 5
When a data engineer says "what downstream datasets depend on this table?", they are performing:
"What downstream datasets depend on this?" is the defining impact analysis question in data lineage work. The answer comes from traversing the lineage graph forward from the node in question. Modern data catalogs (DataHub, Atlan, OpenMetadata) surface this as a "downstream lineage" panel. Engineers use this before schema changes, deprecations, or SLA negotiations.
2 / 5
A "breaking change" in the context of a data pipeline means:
Breaking changes in data pipelines are analogous to breaking API changes in software. Removing a column, changing its data type, or renaming it can silently break dbt models, SQL queries, and dashboards that reference it. Impact analysis via lineage helps engineers know in advance which consumers will break, so they can coordinate changes or maintain backward compatibility.
3 / 5
The phrase "this schema change will affect N pipelines" is typically communicated by:
Modern lineage tools (DataHub's impact analysis, Atlan's change impact, dbt's --select downstream notation) can enumerate exactly how many pipelines, models, and dashboards reference a given field. Communicating "this rename will affect 14 downstream models and 3 dashboards" allows teams to plan coordinated changes — or choose additive approaches (new column + deprecation notice) instead.
4 / 5
Change propagation in data lineage refers to:
Change propagation describes how effects ripple through a lineage graph. If raw.orders loses a column, the effect propagates: staging.orders breaks → mart.revenue breaks → the finance dashboard shows nulls. Understanding propagation depth — how many hops downstream — helps teams triage incidents and plan migrations. Tools like dbt's dag visualisation make propagation paths explicit.
5 / 5
Which phrase best communicates a safe approach to a breaking schema change?
The recommended pattern for breaking schema changes is: (1) add the new column or table additively, (2) mark the old field as deprecated in the data catalog, (3) notify downstream owners identified through impact analysis, (4) agree on a migration window, (5) remove only after all consumers have updated. This avoids silent failures and respects SLAs of dependent teams.