Intermediate Vocabulary #data-engineering #kafka #data-warehouse #pipelines

Data Engineering Vocabulary

5 exercises — the vocabulary every data engineer needs in English: medallion architecture, idempotency, Kafka streaming, data mesh, and slowly changing dimensions.

Core data engineering vocabulary clusters
  • Architecture patterns: medallion (Bronze/Silver/Gold), data lakehouse, data mesh, Lambda/Kappa architecture
  • Pipeline reliability: idempotency, upsert, deduplication, at-least-once, exactly-once, checkpoint
  • Streaming: Kafka, topic, partition, offset, consumer group, lag, retention, replay
  • Storage: Delta Lake, Iceberg, Hudi, Parquet, partitioning, compaction, ACID
  • Warehouse modelling: fact table, dimension, SCD Type 2, surrogate key, star schema, snowflake schema
  • Tooling: Spark, dbt, Airflow, Dagster, Fivetran, Great Expectations, data catalog
0 / 5 completed
1 / 5
A data engineer explains their pipeline:
"We use a medallion architecture — raw data lands in the Bronze layer, gets cleaned and deduplicated in Silver, and the Gold layer contains business-level aggregates ready for dashboards."
What is the medallion architecture?