Why this hub exists: Data platforms now span pipelines, governance, privacy law, and executive reporting, and the English required has split into genuinely distinct professional registers — a data engineer's vocabulary is not a compliance analyst's vocabulary, and neither overlaps much with the presentation language of a BI dashboard. Coders Lingo covers this breadth with nine focused exercise categories rather than one unfocused mega-category — this page is the map that ties them together and explains what makes each one different. This hub is separate from the AI & ML Vocabulary Hub, which covers model-building vocabulary rather than data platform vocabulary.

The data platform English landscape, in plain terms

Broadly, the nine categories below split into three groups. Building and organising data (Data Engineering Language, Data Mesh Architecture Language, Data Contracts & Schema Language, Real-Time & Streaming Data Language) covers the vocabulary of moving and structuring data, from centralised pipelines to decentralised, domain-owned architectures. Governing and protecting data (Data Lineage Vocabulary, Data Privacy Law Language, Synthetic Data Vocabulary) covers tracing where data came from, the legal vocabulary of handling personal data, and generating artificial data as a privacy-preserving alternative. Finally, communicating data (Data Visualization & Dashboard Language, Business Intelligence & Analytics Communication) covers presenting findings to a human audience once the data has been built, governed, and protected.

These categories are not duplicates of each other, even where the icons and topics look similar at a glance. Each card below states in one line exactly what makes that category distinct, so you can jump directly to the vocabulary you actually need rather than working through overlapping material.

The 9 data platform vocabulary categories

Frequently asked questions

Why are there so many separate data vocabulary categories on Coders Lingo?

Modern data platforms split into distinct professional registers: a data engineer building pipelines uses different vocabulary from a governance analyst tracing lineage, and both differ from an analyst narrating a dashboard to executives. Rather than force this into one oversized category, Coders Lingo splits it into nine focused categories so each stays specific and practical. This hub is the map that ties them together, distinct from the separate AI & ML Vocabulary Hub, which covers model-building vocabulary rather than data platform vocabulary.

Which data category should I start with?

If you are new to data vocabulary generally, start with Data Engineering Language — it covers the foundational pipeline vocabulary (ETL/ELT, Kafka, data warehouses) that the more specialised categories assume. From there, branch by concern: teams decentralising ownership should go to Data Mesh Architecture Language, anyone formalising producer/consumer agreements should go to Data Contracts & Schema Language, and anyone presenting findings should go to Data Visualization & Dashboard Language or Business Intelligence & Analytics Communication.

What is the difference between "Data Contracts" and "Data Mesh Architecture"?

Data Mesh Architecture Language covers the organisational vocabulary of decentralising data ownership to domain teams — domain ownership, data as a product, federated governance. Data Contracts & Schema Language covers the narrower, mechanical vocabulary of formalising the interface between a producer and consumer of data — schema evolution and contract testing. A data mesh implementation typically relies on data contracts between its domains, so the two are closely related but not the same topic.

How is "Streaming Data" different from "Data Engineering"?

Data Engineering Language is broad and covers both batch and streaming pipeline vocabulary, including a general introduction to Kafka. Real-Time & Streaming Data Language goes deeper into the event-by-event vocabulary specific to streaming systems — windowing, stream processing patterns, and delivery guarantee semantics (at-least-once, exactly-once) — topics a batch-oriented data engineer may not need.

Isn't "Data Visualization" the same as "Business Intelligence & Analytics"?

They are closely related but framed differently. Data Visualization & Dashboard Language focuses on describing charts and reading dashboards — the mechanics of visual communication. Business Intelligence & Analytics Communication focuses on the business framing around that data — KPIs, funnel analysis, and presenting insights to stakeholders in terms of decisions and outcomes. Many analysts need both.

Why is "Synthetic Data Vocabulary" grouped with data platform categories instead of the AI/ML hub?

Synthetic Data Vocabulary does use ML techniques (GANs, VAEs) to generate data, but its purpose is data platform and privacy engineering — producing safe, realistic data as a substitute for real data subject to Data Privacy Law Language concerns. It is grouped here because the vocabulary is about data management and privacy trade-offs, not about training or evaluating AI models for their own sake.

How many total exercises are covered across the data platform vocabulary cluster?

The nine categories in this hub cover 191 exercises in total, ranging from foundational pipeline vocabulary to specialised governance, privacy, and presentation terminology. Each category is self-contained, so you can start with whichever matches your current role.

Explore more

Browse the full exercise library or the AI & ML Vocabulary Hub for model-building vocabulary.

All exercises AI & ML Vocabulary Hub