Data Platform Engineer
Data Platform Engineers build the infrastructure that other data teams rely on — the pipelines, storage systems, quality frameworks, and tooling that make data reliable and accessible. Their English work includes writing data contracts and pipeline SLAs, presenting data quality reports to stakeholders, documenting migration strategies, and running internal workshops on platform capabilities. This path covers the specialized vocabulary of data infrastructure and platform engineering.
Topics covered
- Batch & streaming pipelines
- Data quality
- Data mesh & contracts
- Change Data Capture
- Data catalogs & lineage
- Self-service data platform
Vocabulary spotlight
4 terms every Data Platform Engineer should know in English:
A formal agreement between a data producer and its consumers specifying the schema, semantics, SLAs, and ownership of a dataset — makes data reliability expectations explicit
"We introduced data contracts for all critical datasets, which reduced breaking schema changes by 70%."
Change Data Capture — a pattern for tracking row-level changes (insert, update, delete) in a database and streaming them to downstream consumers in near real-time
"We replaced nightly batch exports with CDC to give the analytics team near real-time visibility into order changes."
A map of how data flows from its source, through transformations and pipelines, to its final destinations — enabling impact analysis, debugging, and compliance auditing
"Data lineage allowed us to identify in minutes which downstream dashboards were affected by a broken upstream pipeline."
An architectural and organizational approach where data ownership is distributed to domain teams, who are responsible for their own data products — treating data as a product rather than a centrally managed asset
"Migrating to a data mesh decentralised ownership to the product teams, reducing the bottleneck on the central data engineering team."
📚 Vocabulary Reference
Key terms organised by category for Data Platform Engineers:
Pipeline Architecture
Data Quality
CDC & Ingestion
Governance & Catalog
Recommended exercises
Real-world scenarios you'll practise
- Writing a data contract for a new event stream: specifying schema, SLA (freshness, completeness, accuracy), and the escalation path for violations
- Presenting a data quality scorecard to business stakeholders: explaining completeness, freshness, and accuracy metrics for critical datasets
- Proposing a CDC migration: explaining the technical approach, risks, and expected improvement to data latency for real-time analytics
- Running a data mesh readiness workshop: explaining the data product owner model, federated governance, and what each domain team needs to deliver