Practice the vocabulary of storing data by column instead of by row for analytical workloads.
0 / 5 completed
1 / 5
At standup, a dev mentions switching an analytics database to store each column's values contiguously on disk instead of storing each full row together. What is this storage layout called?
Columnar storage stores each column's values contiguously on disk, rather than storing every field of a full row together as row-oriented storage does. This layout lets an analytical query that only needs a few columns, out of a table with many, read just those columns' data instead of scanning every field of every row. It's a foundational design choice behind most modern analytics and data warehouse systems, which typically run this kind of column-selective aggregation query.
2 / 5
During a design review, the team wants a query aggregating one column across billions of rows to scan far less data than reading every row's full record. Which capability supports this?
Column pruning enabled by columnar storage lets a query that only needs one or a few columns read just that column's contiguous data, skipping every other column entirely, rather than reading every full row's complete record regardless of what's actually needed. Reading every row's full record wastes significant I/O when a query only cares about a small subset of the available fields. This pruning is a major reason columnar storage dramatically speeds up an aggregation query over a huge dataset.
3 / 5
In a code review, a dev notices a column of repetitive values, like a status field with only a few distinct values, is compressed far more efficiently in columnar storage than the same field would be in a row-oriented layout. What does this represent?
Column-level compression benefits from value homogeneity, since storing a column's similar or repetitive values contiguously, like a status field with only a handful of distinct values, compresses far more efficiently than when those same values are scattered throughout a row-oriented layout mixed with unrelated fields. Storing every value entirely uncompressed wastes storage and I/O bandwidth compared to taking advantage of this columnar pattern. This compression efficiency is another key reason columnar storage suits analytical workloads especially well.
4 / 5
An incident report shows a transactional workload that frequently inserts and updates a single full row at a time performed poorly after being migrated to a columnar storage layout. What practice would prevent this?
Choosing a storage layout that matches the workload's actual access pattern matters because columnar storage excels at reading a few columns across many rows, but performs relatively poorly for a transactional workload that frequently reads or writes a single full row at a time. Using columnar storage for every workload regardless of its actual pattern ignores this fundamental tradeoff. This deliberate layout choice, row-oriented for transactional work and columnar for analytical work, is a foundational database design decision.
5 / 5
During a PR review, a teammate asks why the team stores this analytics table in a columnar format instead of the row-oriented format used by the main transactional database. What is the reasoning?
Row-oriented storage reads every field of a full row together, which is efficient for a transactional lookup of a single record but wasteful for an analytical query that only needs a handful of columns across billions of rows. Columnar storage lets that kind of query scan only the needed columns, dramatically reducing I/O. The tradeoff is that columnar storage performs relatively poorly for a transactional workload that frequently reads or writes one full row at a time.