5 exercises — choose the best-structured answer to common Data Platform Architect interview questions. Focus on lakehouse medallion architecture, Delta Lake vs Apache Iceberg trade-offs, data contracts and schema evolution, platform governance, and migrating legacy warehouses.
Structure for Data Platform Architect interview answers
Name the architectural pattern: medallion layers, lakehouse vs warehouse, open table format
Explain the trade-off mechanism: ACID on object storage, compaction strategy, time travel cost
Cover governance dimension: ownership model, schema registry, data contracts as interfaces
Communicate to consumers: explain platform choices in terms of data quality and query performance, not implementation
0 / 5 completed
1 / 5
The interviewer asks: "Explain the medallion architecture and walk me through when you would choose it over a traditional star-schema data warehouse." Which answer best covers the architectural trade-offs?
Option B correctly defines all three layers with their specific characteristics, explains the replayability and schema flexibility advantages, gives a concrete cost comparison, and provides decision criteria for choosing star schema vs medallion based on team skills, data volume, and governance requirements. Option A vaguely describes three databases without the key properties of each layer and makes an unsupported quality claim. Option C incorrectly claims medallion is Databricks-specific — it is an architecture pattern implementable on any lakehouse stack. Option D invents two additional layers (raw and platinum) that are not part of the standard medallion definition.
2 / 5
The interviewer asks: "Compare Delta Lake and Apache Iceberg as open table formats. In what scenarios would you choose one over the other?" Which answer demonstrates the deepest technical understanding?
Option B accurately describes the architectural differences (transaction log structure vs three-tier metadata), names specific features (Z-order, hidden partitioning, partition evolution), and gives concrete multi-engine and use-case decision criteria. Option A incorrectly attributes provider origins — Delta Lake is open-source from Databricks, Iceberg is an Apache project from Netflix/Apple — and the cloud provider advice is wrong (both run on all major clouds). Option C is wrong on the storage format — both formats support Parquet (and Iceberg also supports ORC and Avro, but so can Delta). Option D is wrong — both formats store data in columnar files (Parquet by default).
3 / 5
The interviewer asks: "What are data contracts and how do you implement them in a lakehouse architecture?" Which answer best covers the producer-consumer model?
Option B correctly defines data contracts as producer-consumer agreements, covers all four implementation layers (schema, semantic, quality, ownership), names concrete tooling (Great Expectations, Soda, dbt contracts, Glue Schema Registry), explains the breaking change versioning requirement, and connects the concept to data mesh data products. Option A confuses data contracts with legal/GDPR data processing agreements. Option C conflates data contracts with API schema validation — data contracts are broader and include quality SLOs and ownership, not just schema. Option D conflates data contracts with encryption key management.
4 / 5
The interviewer asks: "How do you design a data platform governance model that scales across dozens of domain teams without becoming a bottleneck?" Which answer best describes the federated approach?
Option B correctly articulates the federated governance with centralised standards model, names specific tooling (DataHub, OpenMetadata, Unity Catalog, Apache Ranger, OpenLineage), describes the platform team's role shift from gateway to enabler, and gives four concrete mechanisms (CI/CD contract enforcement, automated lineage, quality scoring, tiered classification). Option A describes exactly the centralised bottleneck the question asks how to avoid. Option C is an oversimplification — a single Hive Metastore addresses catalog consolidation but not governance policies, quality enforcement, or ownership models. Option D defers governance until damage is done; technical debt in data platforms (undocumented schemas, unclear ownership) compounds rapidly.
5 / 5
The interviewer asks: "Walk me through how you would migrate a legacy on-premises data warehouse to a lakehouse architecture with minimal disruption to existing consumers." Which answer best describes the migration strategy?
Option B correctly describes the strangler fig migration pattern with dual-write, shadow read validation, phased consumer migration, SQL compatibility layer, semantic reconciliation opportunity, and decommission criteria. It also identifies the three key risks (performance, access control, ELT logic) and gives a realistic timeline. Option A's big-bang weekend migration is high-risk and ignores data validation, consumer readiness, and semantic drift. Option C is not a technical architecture answer — it offloads responsibility and overstates automation tool capabilities for complex legacy warehouses. Option D uses a mandate-and-deadline approach that ignores consumer readiness and creates business disruption.