Advanced 6 topic areas 25+ exercises

Data Platform Engineer

Data Platform Engineers build the infrastructure that other data teams rely on — the pipelines, storage systems, quality frameworks, and tooling that make data reliable and accessible. Their English work includes writing data contracts and pipeline SLAs, presenting data quality reports to stakeholders, documenting migration strategies, and running internal workshops on platform capabilities. This path covers the specialized vocabulary of data infrastructure and platform engineering.

Topics covered

  • Batch & streaming pipelines
  • Data quality
  • Data mesh & contracts
  • Change Data Capture
  • Data catalogs & lineage
  • Self-service data platform

Vocabulary spotlight

4 terms every Data Platform Engineer should know in English:

data contract n.

A formal agreement between a data producer and its consumers specifying the schema, semantics, SLAs, and ownership of a dataset — makes data reliability expectations explicit

"We introduced data contracts for all critical datasets, which reduced breaking schema changes by 70%."
CDC n.

Change Data Capture — a pattern for tracking row-level changes (insert, update, delete) in a database and streaming them to downstream consumers in near real-time

"We replaced nightly batch exports with CDC to give the analytics team near real-time visibility into order changes."
data lineage n.

A map of how data flows from its source, through transformations and pipelines, to its final destinations — enabling impact analysis, debugging, and compliance auditing

"Data lineage allowed us to identify in minutes which downstream dashboards were affected by a broken upstream pipeline."
data mesh n.

An architectural and organizational approach where data ownership is distributed to domain teams, who are responsible for their own data products — treating data as a product rather than a centrally managed asset

"Migrating to a data mesh decentralised ownership to the product teams, reducing the bottleneck on the central data engineering team."
Open full glossary →

📚 Vocabulary Reference

Key terms organised by category for Data Platform Engineers:

Pipeline Architecture

batch pipelinestreaming pipelinemicro-batchETLELTdata lakehouseLambda architectureKappa architectureorchestrationDAG

Data Quality

data contractdata quality checkcompletenessfreshnessaccuracyconsistencyanomaly detectiondata SLAexpectationdata test

CDC & Ingestion

CDCChange Data CaptureDebeziumlog-based CDCquery-based CDCoutbox patternevent sourcing ingestionfull loadincremental loadidempotent ingestion

Governance & Catalog

data catalogdata lineagedata meshdata productdata domainfederated governancedata contractschema registrymetadata managementdata discovery
Study full vocabulary modules →

Recommended exercises

Real-world scenarios you'll practise

  • Writing a data contract for a new event stream: specifying schema, SLA (freshness, completeness, accuracy), and the escalation path for violations
  • Presenting a data quality scorecard to business stakeholders: explaining completeness, freshness, and accuracy metrics for critical datasets
  • Proposing a CDC migration: explaining the technical approach, risks, and expected improvement to data latency for real-time analytics
  • Running a data mesh readiness workshop: explaining the data product owner model, federated governance, and what each domain team needs to deliver

Recommended reading

Explore another role

🔐 Cryptography Engineer

Open path →