🧭 Data Platform & Analytics Vocabulary Hub
9 categories, 191 exercises. One map for every data platform and analytics English topic on Coders Lingo.
The data platform English landscape, in plain terms
Broadly, the nine categories below split into three groups. Building and organising data (Data Engineering Language, Data Mesh Architecture Language, Data Contracts & Schema Language, Real-Time & Streaming Data Language) covers the vocabulary of moving and structuring data, from centralised pipelines to decentralised, domain-owned architectures. Governing and protecting data (Data Lineage Vocabulary, Data Privacy Law Language, Synthetic Data Vocabulary) covers tracing where data came from, the legal vocabulary of handling personal data, and generating artificial data as a privacy-preserving alternative. Finally, communicating data (Data Visualization & Dashboard Language, Business Intelligence & Analytics Communication) covers presenting findings to a human audience once the data has been built, governed, and protected.
These categories are not duplicates of each other, even where the icons and topics look similar at a glance. Each card below states in one line exactly what makes that category distinct, so you can jump directly to the vocabulary you actually need rather than working through overlapping material.
The 9 data platform vocabulary categories
- Advanced
Data Engineering Language
ETL vs ELT, Kafka streaming vocabulary, data warehouse terms, dbt, data quality dimensions, and incident communication for data teams.
Not a duplicate because: The broadest, most foundational category — building and running data pipelines end to end.
- Advanced
Data Mesh Architecture Language
Domain ownership, data as a product, federated governance, data product design, and migration from centralised architectures.
Not a duplicate because: Organisational and architectural vocabulary for decentralising data ownership across domain teams — not a pipeline technology.
- Intermediate – Advanced
Data Contracts & Schema Language
Data contracts, data agreements, schema evolution, and contract testing vocabulary.
Not a duplicate because: Formalising the interface between data producers and consumers — the agreement layer that data mesh and data engineering both depend on.
- Intermediate – Advanced
Data Lineage Vocabulary
Data lineage tracking, metadata management, data catalogs, data quality, and governance communication.
Not a duplicate because: Tracing where data came from and what depends on it downstream — a governance concern, not a pipeline-building one.
- Intermediate – Advanced
Real-Time & Streaming Data Language
Kafka concepts, stream processing patterns, windowing, and delivery guarantee semantics.
Not a duplicate because: The real-time, event-by-event vocabulary — windowing and delivery guarantees — distinct from batch-oriented data engineering terms.
- Intermediate – Advanced
Data Privacy Law Language
GDPR rights vocabulary, privacy by design language, and data breach notification communication.
Not a duplicate because: Legal and compliance vocabulary for personal data — rights, notifications, and law, not pipeline or architecture terms.
- Advanced
Synthetic Data Vocabulary
Synthetic data generation, data augmentation, differential privacy, and test data management — GANs, VAEs, epsilon-DP, and TSTR evaluation.
Not a duplicate because: Generating and evaluating artificial data as a privacy-preserving alternative to real data, rather than moving or governing real data.
- Intermediate – Advanced
Data Visualization & Dashboard Language
Describing charts, reading monitoring dashboards, and presenting data-driven insights in English.
Not a duplicate because: Communicating data visually to a human audience — the presentation layer, not the pipeline or governance layer beneath it.
- Intermediate – Advanced
Business Intelligence & Analytics Communication
KPIs, metrics, dashboard narration, funnel analysis, and presenting insights to stakeholders.
Not a duplicate because: Translating data into business decisions and stakeholder narratives — closely related to data visualization but framed around business metrics rather than chart mechanics.
Frequently asked questions
Why are there so many separate data vocabulary categories on Coders Lingo?
Modern data platforms split into distinct professional registers: a data engineer building pipelines uses different vocabulary from a governance analyst tracing lineage, and both differ from an analyst narrating a dashboard to executives. Rather than force this into one oversized category, Coders Lingo splits it into nine focused categories so each stays specific and practical. This hub is the map that ties them together, distinct from the separate AI & ML Vocabulary Hub, which covers model-building vocabulary rather than data platform vocabulary.
Which data category should I start with?
If you are new to data vocabulary generally, start with Data Engineering Language — it covers the foundational pipeline vocabulary (ETL/ELT, Kafka, data warehouses) that the more specialised categories assume. From there, branch by concern: teams decentralising ownership should go to Data Mesh Architecture Language, anyone formalising producer/consumer agreements should go to Data Contracts & Schema Language, and anyone presenting findings should go to Data Visualization & Dashboard Language or Business Intelligence & Analytics Communication.
What is the difference between "Data Contracts" and "Data Mesh Architecture"?
Data Mesh Architecture Language covers the organisational vocabulary of decentralising data ownership to domain teams — domain ownership, data as a product, federated governance. Data Contracts & Schema Language covers the narrower, mechanical vocabulary of formalising the interface between a producer and consumer of data — schema evolution and contract testing. A data mesh implementation typically relies on data contracts between its domains, so the two are closely related but not the same topic.
How is "Streaming Data" different from "Data Engineering"?
Data Engineering Language is broad and covers both batch and streaming pipeline vocabulary, including a general introduction to Kafka. Real-Time & Streaming Data Language goes deeper into the event-by-event vocabulary specific to streaming systems — windowing, stream processing patterns, and delivery guarantee semantics (at-least-once, exactly-once) — topics a batch-oriented data engineer may not need.
Isn't "Data Visualization" the same as "Business Intelligence & Analytics"?
They are closely related but framed differently. Data Visualization & Dashboard Language focuses on describing charts and reading dashboards — the mechanics of visual communication. Business Intelligence & Analytics Communication focuses on the business framing around that data — KPIs, funnel analysis, and presenting insights to stakeholders in terms of decisions and outcomes. Many analysts need both.
Why is "Synthetic Data Vocabulary" grouped with data platform categories instead of the AI/ML hub?
Synthetic Data Vocabulary does use ML techniques (GANs, VAEs) to generate data, but its purpose is data platform and privacy engineering — producing safe, realistic data as a substitute for real data subject to Data Privacy Law Language concerns. It is grouped here because the vocabulary is about data management and privacy trade-offs, not about training or evaluating AI models for their own sake.
How many total exercises are covered across the data platform vocabulary cluster?
The nine categories in this hub cover 191 exercises in total, ranging from foundational pipeline vocabulary to specialised governance, privacy, and presentation terminology. Each category is self-contained, so you can start with whichever matches your current role.
Explore more
Browse the full exercise library or the AI & ML Vocabulary Hub for model-building vocabulary.