Practise vocabulary for data catalogs (DataHub, Atlan, OpenMetadata, Alation), business glossary, data domains, data stewards, dataset certification, and data product concepts.
0 / 5 completed
1 / 5
A data catalog is most accurately described as:
Modern data catalogs (DataHub, Atlan, OpenMetadata, Alation) are "the Google for your data": they make datasets discoverable, contextualised, and trustworthy. Without a catalog, engineers spend hours asking "does a table for this exist?" and analysts build shadow spreadsheets instead of using official data.
2 / 5
A business glossary in a data catalog maps:
Business glossary example: "Active User = a user who logged in at least once in the last 30 days AND completed at least one core action, as defined in fact_user_activity.is_active." Without this, finance calculates revenue one way, sales another — the glossary enforces a single definition across teams.
3 / 5
A data steward's primary responsibility is:
Data stewards bridge business and engineering: they answer "What does this field mean?", "Is this data reliable?", "When should it be deleted?", "Who can access it?". The steward is accountable when data quality issues occur — not the engineer who built the pipeline.
4 / 5
A "certified dataset" in a data catalog means:
Certification signals trust: "This dataset is production-quality, owned, documented, and tested." Uncertified datasets may be experiments, work-in-progress, or deprecated assets. Catalogs display certification badges and can filter to certified-only datasets, guiding analysts toward trusted data.
5 / 5
The "data product" concept in a data catalog context refers to:
Data products (core to data mesh) apply product thinking to data: a team owns "the orders data product", guarantees freshness SLAs, publishes a stable schema interface, handles versioning, and is accountable to consumers. This contrasts with the traditional model where data was a by-product of operational systems.