dbt enables analytics engineers to transform data using software engineering best practices. Mastering incremental models, snapshots, and the ref/source functions unlocks dbt's full power for production data pipelines.
0 / 5 completed
1 / 5
A dbt model uses {{ config(materialized='incremental') }} and includes an {% if is_incremental() %} block. What does this block accomplish?
The {% if is_incremental() %} block adds a filter that limits the query to new or changed records when the target table already exists. On the first run (or after --full-refresh), the block is skipped and all rows are processed. This dramatically reduces processing time for large tables.
2 / 5
What does {{ ref('stg_orders') }} in a dbt model do that a raw SQL table reference cannot?
ref() serves two purposes: it builds dbt's DAG (directed acyclic graph) of model dependencies, ensuring models run in the correct order, and it resolves to the correct schema/database name per environment (dev vs prod). Raw SQL references hardcode schema names and break when deploying to different environments.
3 / 5
A dbt developer creates a snapshot using strategy='timestamp'. What does this strategy track?
dbt snapshots with strategy='timestamp' implement Slowly Changing Dimension Type 2 (SCD2): they compare the updated_at column to detect changes and add dbt_valid_from / dbt_valid_to columns to track full row history. The check strategy alternatively hashes specified columns to detect changes.
4 / 5
What is the purpose of {{ source('raw', 'orders') }} in a dbt model?
source() references tables that dbt doesn't manage (e.g., tables loaded by EL tools). Benefits include: lineage tracking from source to final model in the DAG, freshness checks via dbt source freshness, and environment-aware schema resolution via sources.yml configuration.
5 / 5
A dbt project installs dbt-utils as a package. A developer uses {{ dbt_utils.generate_surrogate_key(['order_id', 'line_item_id']) }}. What does this macro produce?
dbt_utils.generate_surrogate_key() produces a deterministic hash (MD5 or SHA256 depending on adapter) of the concatenated input columns, creating a stable surrogate key. It handles NULL coalescing and separator logic, ensuring consistent key generation across databases without relying on database-specific functions.