Practise vocabulary for GDPR right-to-erasure workflows using data lineage: deleting user records, propagating deletions downstream, and compliance documentation language.
0 / 5 completed
1 / 5
The GDPR "right to erasure" (also called "right to be forgotten") requires organisations to:
GDPR Article 17 grants individuals the right to request deletion of their personal data. The obligation is comprehensive — not just the primary database, but all downstream derived tables, aggregations, backups (within reasonable timescales), data warehouse copies, ML training datasets, and third-party processors. Data lineage is essential here: without knowing every system where user X's data exists, you cannot reliably fulfill the erasure request. Failure can result in fines of up to €20 million or 4% of global annual turnover.
2 / 5
When an engineer says "delete all records for user X", in a GDPR lineage context this means:
"Delete all records for user X" in a GDPR context is a multi-system operation. Using lineage, you traverse forward from the source (e.g., raw.users) to find every downstream table that joins on user_id. This may include: staging tables, mart tables, ML feature stores, event logs, BI dashboards, data exports, and third-party integrations. Each must be addressed. Many organisations maintain an "erasure service" that uses lineage metadata to orchestrate this automatically.
3 / 5
"Propagate deletion downstream" in a data lineage workflow means:
Deletion propagation is the downstream equivalent of data propagation. If user X's row is deleted from raw.users, the lineage graph shows that stg.users, dim.customers, fct.orders (joined on customer_id), and the churn prediction feature table all contain derived data. Each must be updated — either by re-running the pipeline (if idempotent), running targeted DELETE/UPDATE statements, or applying a suppression flag. The lineage graph is the map that makes this tractable at scale.
4 / 5
Compliance documentation language for a GDPR erasure request typically includes:
GDPR erasure compliance requires auditable documentation. A complete record should include: (1) date and channel of the erasure request, (2) identity verification steps taken, (3) list of systems searched (derived from lineage), (4) confirmation of deletion in each system with timestamps, (5) any data retained under lawful basis with justification, (6) date the requester was notified of completion. This documentation must be retained (paradoxically) to demonstrate GDPR compliance, but must not itself contain the deleted personal data.
5 / 5
Which approach best uses data lineage to fulfill a GDPR erasure request at scale?
At scale — when a data platform has hundreds of datasets — manual erasure is impractical and error-prone. Lineage-driven erasure automation is the mature approach: (1) lineage graph identifies all affected datasets, (2) each dataset's owner is notified or an automated job is triggered, (3) deletions are logged centrally, (4) a completion audit report is generated. Companies like Databricks, Snowflake, and various DaaS vendors are building native erasure capabilities that leverage lineage metadata for exactly this workflow.