Analytics Engineering in English: dbt, Data Models, and Stakeholder Communication
Learn the English vocabulary analytics engineers use — dbt models, ref(), marts, staging layers, data lineage, and how to communicate data work to stakeholders.
Analytics engineering sits between data engineering and data analysis. Analytics engineers transform raw data into clean, reliable models that analysts and business stakeholders can use. The modern analytics engineering stack is dominated by dbt (data build tool), and the vocabulary has become standard across the industry. This guide covers the key English terms and communication patterns.
dbt Vocabulary
| Term | Definition |
|---|---|
| Model | A SQL file that defines a transformation — dbt compiles and runs it as a SELECT statement |
| ref() | A dbt function that creates a dependency reference between models (e.g. ref('stg_orders')) |
| Materialisation | How a model is stored: table, view, incremental, or ephemeral |
| Source | A raw table ingested from an external system, declared in dbt for documentation and freshness testing |
| Seed | A CSV file loaded into the data warehouse as a table, used for lookup data |
| Test | An automated check on a model’s data (e.g. uniqueness, not-null, referential integrity) |
| Lineage | The visual graph showing how data flows between sources and models |
| Schema YAML | A YAML file that documents models, columns, and tests |
The Three-Layer Architecture
Most dbt projects use a three-layer convention:
-
Staging (stg_) — Clean and rename raw source data; one staging model per source table. “Staging models should apply only renaming, casting, and basic cleaning — no business logic.”
-
Intermediate (int_) — Join and reshape staging models to create useful intermediate datasets. “Intermediate models are not exposed to end users — they’re building blocks for marts.”
-
Mart (mart_ or fct_ / dim_) — Final, business-facing models. These are what analysts query. “The
fct_ordersmart is the single source of truth for order-level metrics.”
ref() in Practice
The ref() function is central to dbt’s dependency management. When you write ref('stg_customers'), dbt knows to build stg_customers before the current model.
“We use ref() instead of hardcoding table names so that dbt can manage build order and correctly resolve environment-specific schemas.”
Data Contract Language
A data contract is an agreement between a data producer and a data consumer about the schema, freshness, and quality of a dataset.
| Term | Meaning |
|---|---|
| Data contract | A formal specification of what a dataset will contain and how it will behave |
| Schema | The column names, data types, and structure of a table |
| Freshness SLA | How up-to-date the data must be (e.g. updated within 6 hours) |
| Breaking change | A change to a data model that would break downstream consumers |
| Backward compatibility | A model change that doesn’t break existing consumers |
| Data catalogue | A searchable inventory of available datasets with documentation |
Contract language patterns:
- “The
fct_revenuemodel is covered by a data contract — any changes to column names or data types must go through a review process.” - “This is a breaking change: renaming
customer_idtouser_idwill require all downstream consumers to update their queries.”
Stakeholder Communication Phrases
Analytics engineers frequently translate between the technical world of SQL and the business world of KPIs and decisions. The language must bridge both.
Explaining a data model to a business stakeholder:
- “The
fct_orderstable is your single source of truth for order data — everything you see in the revenue dashboard is built from this model.” - “When you filter by
order_status = 'completed', you’re looking at orders that have been fully delivered and invoiced.”
Explaining a data quality issue:
- “We noticed some discrepancies between the dashboard and the finance report. After investigating, we found that the finance report was using a different definition of ‘active customer’ — we’re aligning the definitions now.”
Setting expectations on data freshness:
- “This dashboard refreshes every four hours. For real-time figures, you’ll need to query the source system directly — but for daily and weekly reporting, the mart data is sufficient.”
Example Sentences
- “The staging model for the orders source applies type casting and column renaming only — all business logic is deferred to the mart layer.”
- “We use
ref('stg_customers')in the intermediate model so that dbt can correctly infer the build order and prevent stale data from propagating downstream.” - “The data contract for
fct_revenuespecifies that therevenue_usdcolumn is always denominated in US dollars and is never null — any upstream change that violates this will trigger an alert.” - “Lineage analysis showed that the proposed schema change to
stg_productswould break 14 downstream models across three mart tables — we’ve communicated this to the consuming teams.” - “Following the stakeholder review, we agreed to add a
reporting_categorycolumn to the mart, which will allow the product team to filter revenue by the new business segment taxonomy.”