Analytics Engineering in English: dbt, Data Models, and Stakeholder Communication

Learn the English vocabulary analytics engineers use — dbt models, ref(), marts, staging layers, data lineage, and how to communicate data work to stakeholders.

Analytics engineering sits between data engineering and data analysis. Analytics engineers transform raw data into clean, reliable models that analysts and business stakeholders can use. The modern analytics engineering stack is dominated by dbt (data build tool), and the vocabulary has become standard across the industry. This guide covers the key English terms and communication patterns.

dbt Vocabulary

TermDefinition
ModelA SQL file that defines a transformation — dbt compiles and runs it as a SELECT statement
ref()A dbt function that creates a dependency reference between models (e.g. ref('stg_orders'))
MaterialisationHow a model is stored: table, view, incremental, or ephemeral
SourceA raw table ingested from an external system, declared in dbt for documentation and freshness testing
SeedA CSV file loaded into the data warehouse as a table, used for lookup data
TestAn automated check on a model’s data (e.g. uniqueness, not-null, referential integrity)
LineageThe visual graph showing how data flows between sources and models
Schema YAMLA YAML file that documents models, columns, and tests

The Three-Layer Architecture

Most dbt projects use a three-layer convention:

  1. Staging (stg_) — Clean and rename raw source data; one staging model per source table. “Staging models should apply only renaming, casting, and basic cleaning — no business logic.”

  2. Intermediate (int_) — Join and reshape staging models to create useful intermediate datasets. “Intermediate models are not exposed to end users — they’re building blocks for marts.”

  3. Mart (mart_ or fct_ / dim_) — Final, business-facing models. These are what analysts query. “The fct_orders mart is the single source of truth for order-level metrics.”

ref() in Practice

The ref() function is central to dbt’s dependency management. When you write ref('stg_customers'), dbt knows to build stg_customers before the current model.

“We use ref() instead of hardcoding table names so that dbt can manage build order and correctly resolve environment-specific schemas.”

Data Contract Language

A data contract is an agreement between a data producer and a data consumer about the schema, freshness, and quality of a dataset.

TermMeaning
Data contractA formal specification of what a dataset will contain and how it will behave
SchemaThe column names, data types, and structure of a table
Freshness SLAHow up-to-date the data must be (e.g. updated within 6 hours)
Breaking changeA change to a data model that would break downstream consumers
Backward compatibilityA model change that doesn’t break existing consumers
Data catalogueA searchable inventory of available datasets with documentation

Contract language patterns:

  • “The fct_revenue model is covered by a data contract — any changes to column names or data types must go through a review process.”
  • “This is a breaking change: renaming customer_id to user_id will require all downstream consumers to update their queries.”

Stakeholder Communication Phrases

Analytics engineers frequently translate between the technical world of SQL and the business world of KPIs and decisions. The language must bridge both.

Explaining a data model to a business stakeholder:

  • “The fct_orders table is your single source of truth for order data — everything you see in the revenue dashboard is built from this model.”
  • “When you filter by order_status = 'completed', you’re looking at orders that have been fully delivered and invoiced.”

Explaining a data quality issue:

  • “We noticed some discrepancies between the dashboard and the finance report. After investigating, we found that the finance report was using a different definition of ‘active customer’ — we’re aligning the definitions now.”

Setting expectations on data freshness:

  • “This dashboard refreshes every four hours. For real-time figures, you’ll need to query the source system directly — but for daily and weekly reporting, the mart data is sufficient.”

Example Sentences

  1. “The staging model for the orders source applies type casting and column renaming only — all business logic is deferred to the mart layer.”
  2. “We use ref('stg_customers') in the intermediate model so that dbt can correctly infer the build order and prevent stale data from propagating downstream.”
  3. “The data contract for fct_revenue specifies that the revenue_usd column is always denominated in US dollars and is never null — any upstream change that violates this will trigger an alert.”
  4. “Lineage analysis showed that the proposed schema change to stg_products would break 14 downstream models across three mart tables — we’ve communicated this to the consuming teams.”
  5. “Following the stakeholder review, we agreed to add a reporting_category column to the mart, which will allow the product team to filter revenue by the new business segment taxonomy.”