5 exercises on data-warehouse modeling, pipelines, and storage.
0 / 5 completed
1 / 5
What is a star schema?
A star schema is the classic dimensional data-warehouse design. A central fact table holds quantitative measurements, and it links via foreign keys to several dimension tables describing the context — together resembling a star. It deliberately denormalizes dimensions for simpler, faster analytical queries with fewer joins, trading some storage and update complexity for read performance. A snowflake schema is a variant that further normalizes dimensions into sub-tables. Star schemas underpin most BI and reporting workloads.
2 / 5
How do a fact table and a dimension table differ?
In dimensional modeling, the fact table records measurable business events — sales amounts, quantities, durations — as numeric measures alongside foreign keys pointing to dimensions, and it is typically very large and growing. Dimension tables store the descriptive attributes you slice and filter by: customer, product, store, and especially time. They are smaller and relatively stable. The grain of the fact table — what one row represents (e.g. one line item) — is the first thing you define when designing the model.
3 / 5
What is the difference between ETL and ELT?
ETL (Extract, Transform, Load) pulls data from sources, transforms and cleans it in a separate processing layer, then loads the finished result into the warehouse. ELT (Extract, Load, Transform) reverses the last two steps: it loads raw data into a powerful modern warehouse first, then transforms it in place using the warehouse's own compute. ELT has grown popular because cloud warehouses (Snowflake, BigQuery) make in-warehouse transformation cheap and scalable, and keeping raw data enables flexible, repeatable modeling.
4 / 5
What is columnar storage and why does it help analytics?
Columnar storage physically groups all values of a single column together rather than storing complete rows. Analytical queries typically aggregate a few columns over millions of rows, so reading only those columns slashes I/O. Because a column holds uniform, similar data, it also compresses extremely well (run-length, dictionary encoding), further cutting scan costs. Formats like Parquet and ORC and warehouses like BigQuery use this layout. The trade-off is that single-row inserts and lookups are less efficient than in row stores.
5 / 5
What is partitioning in a data warehouse?
Partitioning breaks a huge table into smaller physical segments based on a column's value — most commonly a date, giving daily or monthly partitions. When a query filters on that column, the engine performs partition pruning, scanning only the relevant partitions and skipping the rest, which dramatically cuts cost and time. Partitioning also eases data lifecycle management, since old partitions can be dropped or archived cheaply. It pairs well with clustering, which sorts data within partitions for even finer skipping.