Practice answering Data Governance Engineering interview questions in professional English. 5 exercises on data catalog, lineage, quality, stewardship, and governance frameworks.
What separates good from great data governance answers
Ownership is everything: governance without clear data owners fails — name the stewardship model
Automation over process: data quality rules in code beat checklists in documents
Lineage answers "why": tracing impact of a change is as important as tracing data origin
Business alignment: governance that slows teams down gets bypassed — embed it in workflows
0 / 5 completed
1 / 5
The interviewer asks: "What is a data catalog and what makes one actually useful in practice?" Which answer demonstrates the most operational maturity?
Option B is the strongest: defines what a catalog contains, immediately identifies the core operational failure mode (catalog decay), explains the mechanism (stale descriptions, inaccurate ownership), gives four specific practices that prevent decay (automated harvesting, federated ownership, workflow integration, quality scoring in search results), and redefines success in outcome terms (analysts find data without asking engineers). Option A is a vendor-naming answer. Option C identifies the hardest part correctly but proposes "a governance process and people" without specifics. Option D describes a specific tool implementation (Atlas) that is relevant but vendor-specific and misses the sustainability practices.
2 / 5
The interviewer asks: "What is data lineage and why is it critical for data governance?" Choose the strongest answer.
Option C is the strongest: defines lineage precisely (chain of custody and transformation history), gives three distinct use cases with specific mechanisms (impact analysis prevents risky schema changes, root cause analysis replaces pipeline scanning, GDPR erasure requires flow tracking), makes the critical distinction between column-level and table-level lineage with the practical reason each serves, and ends with the operational preference for automated over manual lineage capture. Option A is accurate but covers only two use cases superficially. Option B mentions compliance and Atlas but not impact analysis or column-level lineage. Option D names dbt but gives no depth on the three use cases or the column-vs-table-level distinction.
3 / 5
The interviewer asks: "How do you define and enforce data quality rules in a data platform?" Which answer shows the most engineering depth?
Option B is the strongest: defines five quality dimensions explicitly (giving interviewers a framework to evaluate completeness of coverage), names two specific tools with their appropriate use cases (dbt for warehouse, Great Expectations for pipeline), explains the enforcement architecture (checks before promotion, quarantine over overwrite), connects quality scores to the data catalog for consumer trust, and — most distinctively — addresses the governance decision of what to do when a check fails with three tiered options. This last point shows operational maturity. Option A is minimal. Option C describes a dbt implementation correctly but has no quality dimension framework or enforcement architecture discussion. Option D mentions observability tools — useful for anomaly detection — but anomaly detection complements rather than replaces rule-based quality enforcement.
4 / 5
The interviewer asks: "What is a data steward and how does stewardship differ from data ownership?" Choose the answer that shows the clearest conceptual understanding.
Option A is the strongest: defines both roles precisely using different accountability dimensions (operational expertise vs business authority), gives a concrete scenario showing which role acts in each situation (quality issue vs policy question), identifies the failure mode when either role is missing (stewards without escalation authority, owners without expertise), and maps the roles to a concrete system implementation (catalog fields with different permissions). Option B is too vague and the observation that the same person often does both undercuts the governance purpose. Option C incorrectly frames stewardship as a subset of ownership — they are complementary roles, not a hierarchy. Option D is accurate but offers no mechanism or failure mode insight.
5 / 5
The interviewer asks: "How do you build a data governance programme that engineering teams actually follow?" Which answer is the most strategically mature?
Option B is the strongest: identifies the core failure mode upfront (imposition vs embedding), articulates three named principles (govern at point of creation, compliance as path of least resistance, leading vs lagging indicators), gives concrete examples for each (CI/CD integration, scaffold template auto-population, new dataset owner-on-day-one metric), and closes with a memorable governing principle. Option A lists organisational mechanisms (executive sponsorship, committee) that are necessary but not sufficient — without the workflow embedding insight they are ineffective. Option C mentions buy-in and workflow integration but has no specifics. Option D mentions policies-as-code correctly but also falls back on a governance council for exceptions — the strongest answer explains how to make exceptions rare.