5 exercises — Practice data mesh vocabulary in English: domain ownership, data-as-a-product, federated governance, data contracts, output ports, and self-serve infrastructure.
Core data mesh vocabulary clusters
Four principles: domain ownership, data-as-a-product, self-serve infrastructure, federated computational governance
Data product: output port, SLA, data product spec, discovery, trust, addressability
Domain: source-aligned domain, aggregate domain, consumer-aligned domain, domain team
Governance: federated governance, global policies, interoperability, data catalog, standards
Data contract: schema, semantics, SLA, quality guarantees, producer-consumer agreement
0 / 5 completed
1 / 5
A data architect introduces data mesh at an all-hands meeting: "Data mesh is an architectural approach to scaling analytical data. It addresses the centralised data platform bottleneck: one data team owning all pipelines becomes a constant constraint. The core idea: decentralise data ownership to domain teams. The team that produces the Orders service also owns the Orders data product. They're accountable for its quality, availability, and freshness. The central platform team shifts from data ownership to providing the infrastructure domains use to publish data products." What is the data-as-a-product principle in data mesh?
Data-as-a-product: the second principle of data mesh. Domain teams apply product thinking to data — treating data consumers as customers. Quality attributes a data product must have: Discoverable: findable in the data catalog. Consumers can search and find it. Addressable: has a stable, unique address (URI). Consumers can programmatically access it. Trustworthy: quality guarantees. SLA for freshness, completeness, accuracy. Consumers can rely on it. Self-describing: schema, semantics, sample data, documentation. Consumers can understand it without asking the producer. Interoperable: uses shared standards so cross-domain joins and pipelines work without transformation. Governed: complies with global policies (access control, PII handling, retention). Data mesh principles: 1) Domain ownership — domain teams own their data. 2) Data-as-a-product — product quality applied to data. 3) Self-serve infrastructure — platform providing tools to produce/consume data without central team help. 4) Federated computational governance — global standards, local enforcement. In conversation: 'The "product" in data-as-a-product is the key shift. It's not just publishing a CSV somewhere — it means an SLA, a schema registry entry, documented semantics, and an on-call rotation for data quality incidents.'
2 / 5
A platform engineer explains the output port concept at a data mesh workshop: "In data mesh, a data product exposes its data through output ports. An output port is the interface through which consumers access the data — it could be a BigQuery table, a Kafka topic, a REST API, or a file in S3. Each output port has a declared schema, SLA, and access policy. The key principle: the data product team controls how the data is exposed. Consumers don't get direct access to operational databases — they go through the output port. This decouples the internal implementation from the external contract." What is a data contract in the context of data mesh?
Data contract: a formal agreement between a data producer and its consumers. Contents: Schema: field names, types, nullable constraints. Semantics: what each field means (business definition, not just technical type). Quality SLA: freshness (updated every 1h), completeness (no_missing_orders_id < 0.1%), accuracy guarantees. Breaking change policy: how and when schemas can change. Notice periods, backward-compatible vs. breaking changes. Access method: which output port to use, authentication. Formats: YAML (most common), JSON Schema, Protobuf. Tools: OpenDataContract standard, Soda, Great Expectations, dbt contracts. Output port vocabulary: Output port: the mechanism through which a data product exposes data. Types: batch (files, BigQuery tables), streaming (Kafka topics), API (REST, GraphQL). Input port: how the data product ingests data from operational systems or other data products. Transformation code: the logic that produces the output from the inputs. Owned by the domain team. Data product spec: metadata describing the data product — owner, consumers, output ports, SLA, tags. Stored in a data catalog. In conversation: 'Before data contracts, every consumer had a hidden dependency on our internal schema. When we changed a column name, three pipelines broke silently. Contracts make those dependencies explicit.'
3 / 5
A data governance lead explains federated governance at a cross-domain meeting: "Federated governance is the fourth data mesh principle. The central platform team defines global policies — encryption standards, PII classification, retention policies, interoperability standards. But enforcement happens locally, within each domain. We can't have a central team auditing every data product — it doesn't scale. Instead, the platform encodes the policies as automated checks that run when a domain publishes a data product. Computational governance means the governance rules are code, not documents." What is computational governance in data mesh?
Computational governance: policies as code. Examples: a pipeline check that automatically rejects a data product registration if PII fields aren't tagged, an automated schema evolution check that fails if a breaking change wasn't versioned, a row-level security policy automatically applied when a consumer accesses a field tagged as GDPR-sensitive. Benefits: scales across many domains without manual central review, consistent enforcement, auditable. Federated governance vocabulary: Global policy: set by the central governance body. Applies to all domains. Examples: all fields containing email must be tagged as PII, all data products must have a schema in the registry, data older than 7 years must be deleted. Local policy: domain-specific governance. Set by the domain team within global constraints. Interoperability standard: shared vocabulary (common date formats, currency units, customer ID format) that allows joining data products across domains. Data catalog: the searchable registry of all data products — discovery, lineage, documentation. Examples: DataHub, Atlan, Collibra. Data lineage: the provenance of data — where it came from, what transformed it. Helps with debugging and impact analysis. Policy engine: the software that evaluates and enforces governance rules (Apache Ranger, OPA). In conversation: 'If your governance is a PDF document with rules, you don't have governance — you have aspirational documentation. Computational governance makes it impossible to skip.'
4 / 5
A domain engineer explains domain types at a data mesh adoption workshop: "Data mesh domains come in three flavours. Source-aligned domains are closest to operational systems — the Orders domain publishes raw and cleaned orders data. Aggregate domains combine data from multiple source domains — Customer360 joins Orders, Payments, Support. Consumer-aligned domains optimise data for specific consumers — the Finance domain gets pre-aggregated revenue reports. Most domains start as source-aligned; aggregate domains add value on top. Consumer-aligned domains are controversial — they can create tight coupling." What distinguishes a source-aligned domain from an aggregate domain?
Source-aligned domain: owns data close to the operational system. Responsibility: expose the operational data accurately. Minimal business transformation. Example: Orders domain publishes raw orders, cleaned orders, cancelled orders — all reflecting what happened in the Orders system. Aggregate domain: consumes from multiple source domains. Adds value by combining or deriving new insights. Example: Customer 360 — joins Orders, Payments, Support, Marketing data to create a unified customer view. No single source domain could produce this. Consumer-aligned domain: optimised for a specific consumer. Example: Finance reporting domain — pre-aggregated and formatted for the Finance BI tool. Debate: tightly coupled to one consumer, may duplicate other domain data. Data mesh organisational vocabulary: Domain team: a cross-functional team (engineers, data engineers, domain experts) responsible for a business domain's data products. Platform team: provides the self-serve infrastructure. Not responsible for domain data quality. Data product owner: the person accountable for a data product's quality, SLA, and roadmap. Data steward: responsible for metadata, documentation, and governance compliance of data products. In conversation: 'We started with all domains as source-aligned. Aggregate domains emerged organically when multiple teams needed the same join — at that point, you either build an aggregate domain or accept duplicated logic.'
5 / 5
A senior data engineer contrasts data mesh with a central data platform: "The traditional approach: one data team, one data warehouse, all ETL pipelines owned centrally. Advantage: consistency. Problem: the central team becomes a bottleneck. Every new dataset request goes into a backlog. Three-month wait times for new pipelines. Data mesh solves this by distributing ownership — but it creates new challenges: how do you prevent 50 domain teams from defining 'customer' in 50 different ways? Interoperability standards and a shared catalog address this. The trade-off is organisational complexity for scalability." What is self-serve data infrastructure in data mesh?
Self-serve infrastructure: the third principle of data mesh. The platform team builds tools that make it easy for domain teams to: Build: scaffolding for data pipelines, schema registration, data product templates. Deploy: CI/CD for data products, automated quality checks, automated catalog registration. Operate: monitoring dashboards, alerting, SLA tracking. Discover and consume: data catalog, access request portal, query interfaces. Goal: a domain team should be able to go from "we need to publish this data" to a production data product in days, not months. Without self-serve infrastructure, data mesh doesn't scale — domain teams are blocked by platform team tickets. Data mesh vs. alternatives: Data warehouse: centralised storage. Good for BI. Bottleneck at scale. Data lake: centralised raw storage. Schema-on-read flexibility. Governance challenges. Data lakehouse: combines lake and warehouse (Delta Lake, Apache Iceberg). Still centralised. Data mesh: decentralised ownership with federated governance. Solves ownership and quality bottlenecks. Requires strong platform tooling. Semantic layer: a consistent business metric definition layer (dbt Semantic Layer, Cube) that can complement any architecture. In conversation: 'Data mesh without self-serve infrastructure is just bureaucracy. Domains are accountable for quality, but if they need a ticket to publish anything, you haven't solved the bottleneck — you've just moved it.'