5 exercises — practice KB, MB, GB, TB in technical context: disk vs. memory, "fits in memory", working sets, and the vocabulary for reasoning about data sizes.
0 / 5 completed
Key storage vocabulary for IT conversations
"Working set — the hot subset of data that's actively read; if it fits in RAM, disk I/O is avoided."
"Fits in memory means the working set can be cached in the buffer pool or page cache."
"64GB of what?" — always clarify RAM vs. disk; they have very different speed, cost, and persistence."
"2TB / 50MB per hour = 40,000 hours ≈ 4.5 years (assuming no rotation)."
"OS overhead is typically 1-2GB RAM — factor this in before declaring you have 'enough' RAM."
1 / 5
A product manager asks: "Will 16GB RAM be enough for our application?" Your profiling shows the app uses about 2.4GB at peak. Which response is most accurate?
Option B gives a complete answer. It addresses: (1) the application's actual peak usage (2.4GB), (2) OS overhead — often forgotten; the OS itself requires 1-2GB, (3) co-located services — other processes on the same host consume memory, (4) headroom for spikes — peak profiling data is a snapshot, not the absolute worst case. This is the kind of complete memory analysis that prevents "works in dev, OOMKills in prod" surprises. Option A is correct in raw numbers but dismissive — "way more than enough" doesn't account for OS overhead or growth. Option C is overcautious and unhelpful; you have enough information to give a useful answer. Option D is technically accurate but misses all the practical context that matters.
2 / 5
A colleague says "our database is only 800MB — it fits in memory." What does "fits in memory" mean in database context, and is this claim meaningful?
Option B provides the technically accurate definition. Key concepts: (1) working set = the subset of data that is actively read/written in normal operation — typically a fraction of total data, (2) buffer pool (PostgreSQL/MySQL) or page cache — the OS or database layer that caches disk pages in RAM, (3) if the working set fits in RAM, read queries can be served from memory, avoiding disk I/O (which is 1000x slower). The claim "800MB fits in memory" is meaningful IF the 800MB represents the entire hot working set. But it's only part of the story: write amplification, WAL files, and future growth also affect performance. Option A is partially right (all queries don't become "instant" — CPU still processes them). Option C is wrong. Option D conflates "cached in memory" with "stored only in memory."
3 / 5
Which sentence correctly uses data size vocabulary in a technical context?
Option B is the correct and complete technical calculation. The math: 2TB = 2,000GB = 2,000,000MB. At 50MB/hour: 2,000,000 / 50 = 40,000 hours ÷ 24 = 1,667 days ÷ 365 = 4.57 years. Critically, Option B adds "assuming no log rotation" — this is an important qualifier because in practice, log rotation would mean the disk never fills this way. The caveat transforms a potentially misleading calculation into a useful one. Option A has the same math as B but omits the log rotation qualifier, which is an important omission in a real system. Option C does the calculation vaguely ("lots of space") without quantifying. Option D has an arithmetic error: 2TB/50MB = 40,000 (not 40TB — units must match before division).
4 / 5
A developer says "the dataset is large — about 500GB." Which follow-up question is most useful for architecture decisions?
Option B is the complete set of questions that actually matter for architecture decisions. Breaking it down: (1) working set size — 500GB total data might have a 5GB hot working set that fits in RAM, changing the entire architecture, (2) growth rate — 500GB growing at 100GB/month requires a different approach than 500GB stable for years, (3) structured vs. binary — a 500GB relational database is architected completely differently from 500GB of user-uploaded images. Option A (compressed/uncompressed) is useful but secondary — it affects storage costs, not primary architecture. Option C is dismissive — "large" is relative to the problem, not to some absolute scale. Option D (format) is useful but narrower than Option B's complete picture. When someone mentions a data size, the follow-up questions should always address: working set, growth rate, and structure type.
5 / 5
What is the correct technical distinction between "disk storage" and "memory" when a colleague says "we have 64GB"?
Option B correctly explains why "64GB" needs qualification. The key differences: (1) Persistence — disk data survives power cycles; RAM data is lost on restart, (2) Access speed — RAM: ~100 nanoseconds random access; NVMe SSD: ~100 microseconds (1,000x slower); HDD: ~10 milliseconds (100,000x slower), (3) Capacity economics — 64GB RAM (2025: ~$150-200) vs. 64GB NVMe SSD (2025: ~$10) — completely different cost profiles, (4) Usage patterns — 64GB RAM can cache enormous working sets; 64GB disk is very limited for most production applications. The practical rule: whenever someone mentions a storage number without a type, ask "RAM or disk?" — the answer changes every architectural decision downstream.