Practise vocabulary for navigating data catalogs: finding datasets, reading catalog entries, understanding data ownership, quality scores, and freshness metadata.
0 / 5 completed
1 / 5
In a data catalog, "the data owner is" refers to:
"Data owner" is a formal governance role. In a catalog entry (DataHub, Atlan, OpenMetadata), the owner field identifies who is accountable — often a domain team lead or product manager, not the engineer who built the pipeline. When you need to request access or understand business context, the data owner is your first point of contact. Distinct from "data steward" (who manages day-to-day quality) and "data custodian" (who manages technical storage).
2 / 5
When a catalog entry shows "last updated: 3 days ago", this metadata indicates:
"Last updated" (or "last ingested", "last refreshed") is an operational metadata field that tracks data freshness. If you need daily data and the catalog shows "last updated: 3 days ago", you know there is a potential freshness issue before even querying the table. Data observability tools (Monte Carlo, Soda) monitor this and alert when freshness SLAs are breached. This metadata is distinct from schema change history.
3 / 5
A "quality score" in a data catalog entry typically represents:
Quality scores in catalogs (e.g., Atlan's "trust score", DataHub's assertions, Great Expectations integrations) aggregate automated data quality checks into a single indicator. A high score means recent checks for null rates, referential integrity, value distribution, and freshness all passed. Consumers can filter catalog search results by quality score to find datasets that are safe to rely on — avoiding the common pitfall of building on poorly-understood data.
4 / 5
When searching a data catalog, which action best describes "finding a dataset"?
The key differentiator of a data catalog over a raw schema browser is business context. You can search for "customer revenue" and find the dataset because it has been tagged with a business glossary term — even if the underlying table is named "fct_rev_by_cust_mthly". The catalog entry then tells you: owner, quality score, freshness, upstream lineage, certified status, and usage popularity. This is "data discovery" in the governance sense.
5 / 5
A "certified" dataset badge in a data catalog communicates:
"Certified" (or "verified", "endorsed") status in a catalog (DataHub, Atlan, Alation) means the dataset is the single source of truth for a concept. It has been reviewed by the data owner, documented, and quality-tested. Teams building dashboards or ML models should prefer certified datasets. The inverse — "deprecated" — signals teams to migrate away. This certification lifecycle is central to data governance programs.