Semantic layer: single definition of a metric, decoupled from SQL — MetricFlow, dbt metrics, or Looker LookML
Data contracts: schema + SLA + quality rules → enforced in CI → prevents producer from silently breaking consumers
0 / 5 completed
1 / 5
The interviewer asks: "How do you decide when to use dbt models vs. pushing logic to the BI layer?" Which answer best demonstrates analytics engineering maturity?
Option B is strongest: it states the decision principle clearly first, provides concrete examples for both sides of the boundary, explains the anti-pattern mechanism (how BI-layer duplication causes the specific failure of inconsistent metric definitions), and introduces the semantic layer as the architectural solution with specific use cases. The "trust in data collapses" framing is a real production outcome. Analytics engineering vocabulary:Semantic layer — a data access layer that enforces consistent metric definitions across BI tools. MetricFlow — dbt's metric definition framework. LookML — Looker's modelling language for semantic layer definitions. Date spine — a table of consecutive dates used for filling gaps in time-series data. Invisible duplication — logic that exists in multiple places without explicit references to each other. Options C and D are accurate but lack the anti-pattern explanation and the semantic layer use cases.
2 / 5
The interviewer asks: "Walk me through how you'd model a slowly changing dimension in dbt." Which answer is most complete?
Option B is strongest: it explains all three SCD types with "use when" guidance, introduces dbt snapshots as the implementation tool with configuration details (unique_key, strategy options and their differences), and — most importantly — explains the historically-accurate join pattern and names it as the most common SCD usage mistake. That specific mistake insight is what separates a practitioner from someone who only knows the theory. SCD vocabulary:SCD (Slowly Changing Dimension) — a dimension table attribute that changes over time. Surrogate key — a generated unique key for each row version in a Type 2 SCD. Natural key — the original business identifier (e.g., customer_id). dbt snapshot — dbt's built-in command for SCD Type 2 implementation. Timestamp strategy — snapshot change detection based on an updated_at column. Check strategy — snapshot change detection based on value comparison of specified columns. Options C and D are accurate but lack the join pattern anti-pattern explanation.
3 / 5
The interviewer asks: "How do you test data quality and reliability in a dbt project?" Which answer is most comprehensive?
Option B is strongest: it names three test levels, explains how dbt schema tests work internally (SELECT returning failing rows), introduces the severity configuration (error vs. warn), provides specific examples of custom tests including the anomaly detection example (which separates mature practitioners), explains source freshness with configuration details, and includes the meta-test (every model must have a test) — which is a CI best practice that shows production experience. dbt testing vocabulary:Schema test — a dbt built-in test that validates structural data properties. Singular test — a custom SQL test file in the tests/ directory. dbt_utils — a dbt package adding utility macros and tests. Source freshness — dbt command that checks whether source tables have been updated within their SLA. loaded_at_field — the timestamp column used to measure source freshness. Options C and D are accurate but lack the internal test mechanism explanation and the meta-test CI practice.
4 / 5
The interviewer asks: "What's the difference between a staging, intermediate, and mart model in dbt?" Which answer is most precise?
Option B is strongest: it names the prefix conventions (stg_, int_, mart_/fct_/dim_), provides the specific rule that staging has no cross-source joins (a constraint not just a description), explains the architectural reason for each layer (protection, reuse, consumption optimisation), distinguishes fact tables from dimension tables with the "one row per" rule, and adds practical guidance (file organisation, access controls). dbt model vocabulary:Staging model (stg_) — a source-faithful cleaning model. Intermediate model (int_) — a reusable business logic model not exposed to end users. Mart model — a business-domain-specific table for BI consumption. Fact table (fct_) — a table of events or transactions. Dimension table (dim_) — a table of entity attributes. Denormalised — attributes stored redundantly to avoid joins at query time. Options C and D are accurate but lack the prefix conventions and the no-cross-source-join constraint.
5 / 5
The interviewer asks: "How do you manage cross-team data contracts in a large analytics engineering team?" Which answer shows the most mature practice?
Option B is strongest: it explains why contracts are needed (formalising implicit dependencies), defines all four contract components, explains the CI enforcement mechanism and its cultural benefit ("we need to coordinate" vs. "you broke us"), provides the three-step migration process, integrates data catalog tooling, and makes the scaling argument explicit — the shift from Slack communication to self-serve catalog discovery is the key scalability insight. Data contract vocabulary:Producer — the team that owns and publishes the data model. Consumer — the team that depends on the data model. Consumer contract test — a test in the consumer's CI that validates the producer's schema still meets expectations. Breaking change — a schema modification that invalidates a registered consumer expectation. Data catalog — a searchable registry of data assets and their metadata. Options C and D are accurate but lack the cultural benefit framing and the scalability argument.