Advanced Vocabulary #dbt#analytics-engineering#data-modeling#modern-data-stack

Analytics Engineering Vocabulary

5 exercises — Practice key terms used by analytics engineers: dbt models, materializations, staging layers, ref(), Jinja templating, schema tests, lineage graphs, and the semantic layer.

Core analytics engineering vocabulary clusters
  • dbt fundamentals: model, source, ref(), seed, snapshot, Jinja template, schema.yml, documentation site
  • Layered architecture: staging layer, intermediate layer, mart, data contract, exposure
  • Materializations: table, view, incremental, ephemeral — controls how dbt writes results to the warehouse
  • Quality & governance: schema test (generic test), singular test, lineage graph, data contract, semantic layer
  • Metrics & discovery: metrics layer, semantic layer, exposure, dbt docs, model documentation
0 / 5 completed
1 / 5
An analytics engineer is reviewing a pull request. She explains the change to her colleague:
"I replaced the hard-coded table name in line 12 with ref('stg_orders'). This tells dbt about the dependency so it builds the models in the correct order and resolves the correct database and schema at run time."
What is the primary purpose of the ref() function in dbt?

Vocabulary Reference

Key analytics engineering terms and their definitions.

dbt (data build tool)
An open-source transformation framework that lets analytics engineers write SELECT-based SQL models, run tests, and generate documentation. dbt handles dependency resolution, DAG construction, and materialisation — it does T in ELT.
model
A single .sql file in a dbt project. Each model is a SELECT statement; dbt compiles it and writes the result to the warehouse according to the configured materialisation (table, view, incremental, or ephemeral).
staging layer
The first transformation layer in a dbt project. Staging models map 1:1 to raw source tables, renaming and casting columns to house conventions. They contain no business logic and are always built as views.
mart
A business-facing data model optimised for a specific domain or team (e.g., finance mart, marketing mart). Marts sit at the top of the transformation stack and are exposed to BI tools. Typically materialised as tables for query performance.
ref()
A dbt Jinja function used to reference another dbt model. It registers the dependency in the DAG and resolves the correct fully qualified table name for the active target environment, enabling environment-aware builds.
materialization
How dbt persists a model's result in the warehouse. Options: table (full rebuild each run), view (no stored data, query runs live), incremental (appends or merges only new/changed rows), ephemeral (inlined as a CTE, never stored).
snapshot
A dbt feature that captures historical states of a mutable source table by adding dbt_valid_from / dbt_valid_to columns. Used to implement Type 2 slowly changing dimensions (SCDs) without custom pipeline code.
lineage graph
The Directed Acyclic Graph (DAG) dbt constructs from all ref() and source() calls in a project. It defines build order, enables impact analysis, and is visualised interactively in the dbt documentation site.
schema test (generic test)
A reusable dbt test declared in schema.yml and applied to a column: not_null, unique, accepted_values, relationships. Runs after each build to enforce data quality assertions automatically.
semantic layer
A centralised service where business metrics (revenue, churn rate, MAU) are defined once in code and consumed consistently by all downstream tools. Eliminates metric inconsistency across dashboards, notebooks, and APIs. Also called the metrics layer.