Learn the vocabulary of writing a SQL-first data transformation that runs directly inside the warehouse.
0 / 5 completed
1 / 5
At standup, a dev mentions writing a data transformation as a SQL SELECT statement that a tool compiles and runs directly inside the warehouse, rather than extracting data out to a separate transformation engine. What tool is being described?
dbt, the data build tool, lets an analyst write a transformation as a SQL SELECT statement that dbt compiles and runs directly inside the warehouse, rather than extracting data out to a separate transformation engine. A traditional ETL tool that extracts data out to transform it externally adds an extra hop and often duplicates data outside the warehouse unnecessarily. This in-warehouse, SQL-first approach, sometimes called ELT, is what makes dbt a popular choice for a modern data transformation workflow.
2 / 5
During a design review, the team wants one model's SQL to reference another model by name rather than a hardcoded table name, so dbt can automatically figure out the correct build order between them. Which capability supports this?
The ref() function lets one model's SQL reference another model by name rather than a hardcoded table name, letting dbt automatically build a dependency graph and figure out the correct build order between them. Referencing another model's table by its hardcoded name loses that automatic dependency tracking, risking a model being built before an upstream model it actually depends on. This ref()-based dependency graph is one of dbt's most fundamental features for managing a set of interrelated transformations.
3 / 5
In a code review, a dev notices a dbt test is defined declaratively on a model's column, like asserting a column's values are always unique and never null, and dbt runs that test automatically as part of the build. What does this represent?
Declarative data testing built into the dbt workflow lets a test, like asserting a column's values are always unique and never null, be defined right alongside the model and run automatically as part of the build. Manually spot-checking a sample of rows by hand after every build is far less thorough and doesn't happen consistently every time the models rebuild. This built-in testing is what lets a team catch a data-quality regression automatically rather than discovering it downstream in a report.
4 / 5
An incident report shows a broken downstream model silently produced incorrect numbers for a week because an upstream model's output had started containing duplicate rows, and no test existed to catch that duplication automatically. What practice would prevent this?
Adding a declarative dbt test asserting the upstream model's key column stays unique makes the build fail loudly the moment duplication appears, rather than silently propagating bad data downstream for a week as this incident describes. Continuing to run the models with no such test relies entirely on someone eventually noticing the incorrect numbers by chance. This kind of declarative testing is a standard safeguard for any dbt model whose output feeds into further downstream transformations or reports.
5 / 5
During a PR review, a teammate asks why the team writes its data transformations in dbt instead of a traditional ETL tool that extracts data out of the warehouse to transform it in a separate engine. What is the reasoning?
A traditional ETL tool adds an extra hop and often duplicates data outside the warehouse, since it extracts data out to transform it in a separate engine before loading the result back in. dbt compiles and runs a SQL-based transformation directly inside the warehouse, with dependency tracking through ref() and declarative testing built directly into the same workflow. The tradeoff is that dbt's approach depends on the warehouse itself being powerful enough to run the transformation logic directly, rather than offloading that work to a separate external engine.