English for dbt Unit Tests and Model Contracts
Learn the English vocabulary for dbt testing: model contracts, column-level constraints, unit test YAML, given/expect patterns, and fixture data explained.
dbt has become the standard transformation layer in modern data stacks, and its testing ecosystem has matured significantly with the introduction of unit tests and model contracts. Data engineers working in cross-functional teams need precise English vocabulary for these features to write effective documentation, conduct thorough code reviews, and communicate data quality standards to analytics and product stakeholders.
Key Vocabulary
Model contract — a dbt configuration that enforces the schema of a model’s output, guaranteeing that specified columns exist with defined data types and constraints, regardless of upstream changes. “Add a model contract to the orders model so that any upstream change removing the customer_id column fails the build immediately rather than silently producing null values downstream.”
Column-level constraint — a rule applied to a specific column in a model contract, such as not_null, unique, primary_key, or a custom check expression. “Define a not_null constraint on order_total in the contract — if any upstream transformation introduces nulls, the materialisation will fail with a clear error.”
Unit test — a dbt test that validates the SQL logic of a model by providing mocked input data (fixtures) and asserting that the model produces expected output, without touching the actual warehouse data. “Write a unit test for the revenue attribution model using fixture data that covers edge cases like split orders and refunds — these are impossible to guarantee in raw production data.”
YAML test definition — the declarative YAML block in a dbt project’s test file that specifies the model under test, the mocked inputs, and the expected output rows. “Add the YAML test definition to tests/unit/ alongside the model schema file so reviewers can see the test coverage without navigating the warehouse.”
Given/expect pattern — the dbt unit test structure where given blocks define mocked input data for upstream models or sources, and the expect block defines the rows the tested model should produce. “Structure each unit test with a given block for every referenced source and an expect block for the output — this mirrors the arrange-act-assert pattern from software testing.”
Fixture data — the small, hand-crafted row sets defined in the given and expect blocks of a dbt unit test, designed to isolate specific transformation logic. “Keep fixture data minimal — three to five rows per given block is usually sufficient to cover the happy path and the most important edge cases.”
Materialisation — the strategy dbt uses to persist a model’s output in the warehouse, such as table, view, incremental, or ephemeral. “Use the incremental materialisation for the events model so that each run appends new records rather than rebuilding the entire table.”
Schema test — the older dbt terminology (now called a data test in dbt Core 1.1+) for assertions run against actual warehouse data, such as uniqueness or referential integrity checks. “In addition to unit tests that validate logic with fixtures, add schema tests to the production table to catch data quality issues that only appear in real data distributions.”
Common Phrases
- “The model contract will catch breaking schema changes before they reach the BI layer.”
- “Mock out the upstream models in the given block — we want to test our logic in isolation, not the source data.”
- “The expect block should reflect exactly what the model produces for the given inputs.”
- “Materialise this model as a table if downstream queries are slow; use a view if storage cost is a concern.”
- “Run dbt build to execute both the transformations and all associated tests in dependency order.”
Example Sentences
When explaining unit tests to a data analyst unfamiliar with software testing: “A dbt unit test lets us verify that our SQL logic is correct by feeding it small, carefully chosen rows and checking that the output matches what we expect. It’s like testing a calculator function: you give it specific inputs and confirm the output is right, without needing the real database to be populated.”
When writing a pull request description for a new model contract: “This PR adds a model contract to the customer_lifetime_value model, enforcing not_null and primary_key constraints on customer_id and a not_null constraint on ltv_90d. Any upstream change that violates these constraints will now fail the CI build rather than silently corrupting the BI dashboard.”
When conducting a code review: “The given block is using production table names rather than fixture values. Unit tests should reference mocked inputs so the test result is deterministic regardless of what is in the warehouse at the time the test runs.”
Professional Tips
- Distinguish unit tests (logic validation with fixtures) from data tests (quality assertions against real warehouse data) — conflating them causes confusion about what a test failure means.
- Use the phrase “contract-driven development” when pitching model contracts to engineering leadership — it signals alignment with software engineering best practices and makes the concept instantly recognisable.
- When writing fixture data, include at least one edge case row alongside the happy path — null values, zero amounts, and boundary dates are common sources of transformation bugs.
- Describe model contracts as a “schema guardrail at the materialisation layer” when explaining them to data consumers who care about dashboard reliability.
Practice Exercise
- A junior analyst asks why unit tests are necessary if you already have schema tests. Write two sentences explaining the difference and why both are valuable.
- You are writing a unit test for a model that calculates a 7-day rolling average. What fixture rows would you include in the given block to test the edge case at the start of a user’s history?
- Write a one-sentence description of a model contract suitable for a pull request comment directed at a non-engineering stakeholder.