Analytics Engineering Vocabulary
5 exercises — Practice key terms used by analytics engineers: dbt models, materializations, staging layers, ref(), Jinja templating, schema tests, lineage graphs, and the semantic layer.
Core analytics engineering vocabulary clusters
- dbt fundamentals: model, source, ref(), seed, snapshot, Jinja template, schema.yml, documentation site
- Layered architecture: staging layer, intermediate layer, mart, data contract, exposure
- Materializations: table, view, incremental, ephemeral — controls how dbt writes results to the warehouse
- Quality & governance: schema test (generic test), singular test, lineage graph, data contract, semantic layer
- Metrics & discovery: metrics layer, semantic layer, exposure, dbt docs, model documentation
0 / 5 completed
1 / 5
An analytics engineer is reviewing a pull request. She explains the change to her colleague:
"I replaced the hard-coded table name in line 12 with
What is the primary purpose of the ref() function in dbt?
"I replaced the hard-coded table name in line 12 with
ref('stg_orders'). This tells dbt about the dependency so it builds the models in the correct order and resolves the correct database and schema at run time."What is the primary purpose of the ref() function in dbt?
ref(): the central dbt function that creates inter-model dependencies. When you write
FROM {{ ref('stg_orders') }}, dbt: (1) resolves the correct database.schema.table name for the current target environment (dev vs. prod); (2) registers the dependency in the DAG (Directed Acyclic Graph) so models build in the right order. Without ref(), models would be isolated — dbt could not build the lineage graph or enforce build order. Contrast with source(), which references raw upstream tables defined in sources.yml rather than other dbt models. In conversation: "Always use ref() between models — hard-coding table names breaks the DAG and means your CI run can silently reference production data instead of the dev schema."Next up: DataOps Vocabulary →
Vocabulary Reference
Key analytics engineering terms and their definitions.
- dbt (data build tool)
- An open-source transformation framework that lets analytics engineers write SELECT-based SQL models, run tests, and generate documentation. dbt handles dependency resolution, DAG construction, and materialisation — it does T in ELT.
- model
- A single
.sqlfile in a dbt project. Each model is a SELECT statement; dbt compiles it and writes the result to the warehouse according to the configured materialisation (table, view, incremental, or ephemeral). - staging layer
- The first transformation layer in a dbt project. Staging models map 1:1 to raw source tables, renaming and casting columns to house conventions. They contain no business logic and are always built as views.
- mart
- A business-facing data model optimised for a specific domain or team (e.g., finance mart, marketing mart). Marts sit at the top of the transformation stack and are exposed to BI tools. Typically materialised as tables for query performance.
- ref()
- A dbt Jinja function used to reference another dbt model. It registers the dependency in the DAG and resolves the correct fully qualified table name for the active target environment, enabling environment-aware builds.
- materialization
- How dbt persists a model's result in the warehouse. Options: table (full rebuild each run), view (no stored data, query runs live), incremental (appends or merges only new/changed rows), ephemeral (inlined as a CTE, never stored).
- snapshot
- A dbt feature that captures historical states of a mutable source table by adding
dbt_valid_from/dbt_valid_tocolumns. Used to implement Type 2 slowly changing dimensions (SCDs) without custom pipeline code. - lineage graph
- The Directed Acyclic Graph (DAG) dbt constructs from all
ref()andsource()calls in a project. It defines build order, enables impact analysis, and is visualised interactively in the dbt documentation site. - schema test (generic test)
- A reusable dbt test declared in
schema.ymland applied to a column:not_null,unique,accepted_values,relationships. Runs after each build to enforce data quality assertions automatically. - semantic layer
- A centralised service where business metrics (revenue, churn rate, MAU) are defined once in code and consumed consistently by all downstream tools. Eliminates metric inconsistency across dashboards, notebooks, and APIs. Also called the metrics layer.