Build fluency in the vocabulary of the fixed-size chunk a CPU always fetches into its cache.
0 / 5 completed
1 / 5
At standup, a dev mentions the CPU always fetching memory in fixed-size chunks, typically 64 bytes, into its cache, rather than pulling in only the single byte or variable a program actually asked for. What is this fixed-size chunk called?
A cache line is the fixed-size chunk of memory, typically 64 bytes on most modern CPUs, that the cache always fetches as a whole, even if the program only actually asked for a single byte or one small variable within it. A hash collision is an unrelated hash-table concept about two keys sharing a bucket. This whole-chunk fetching is exactly why accessing one value from memory often effectively pre-loads its neighboring bytes into the cache too, which is the basis for why sequential memory access tends to be so much faster than scattered, unpredictable access.
2 / 5
During a design review, the team lays out a hot array so elements accessed together are stored contiguously, specifically so a single cache-line fetch pulls in several of the elements a loop will need next. Which capability does this contiguous layout provide?
Contiguous layout provides fewer cache misses overall, since a single cache-line fetch pulls in several nearby elements at once, so once the loop reaches the second or third element in that same cache line, it's already sitting in the cache from the first element's fetch, with no additional memory access needed. Scattering those same elements randomly throughout memory would instead require a separate cache-line fetch for nearly every single element, since none of them would happen to share a line with another one the loop needs. This contiguous-layout benefit is exactly why array-of-structs versus struct-of-arrays layout decisions matter so much for performance-sensitive code that iterates over data in a predictable pattern.
3 / 5
In a code review, a dev notices a hot loop iterates over a large array of objects, but only reads one small field from each object, while every object also carries several other large, unrelated fields packed in between the fields the loop actually needs. What does this represent?
This is a cache-line-wasting layout, since each fetched cache line ends up mostly filled with unrelated fields the loop never actually reads, meaning far fewer of the loop's actually-needed values fit into any single fetched line, forcing many more cache-line fetches than the loop's real data footprint would otherwise require. A cache eviction policy is an unrelated concept about discarded cache entries. This is exactly the kind of layout issue that motivates restructuring data as a struct-of-arrays, so the field a hot loop actually needs is packed contiguously with only its own kind, rather than interleaved with unrelated data.
4 / 5
An incident report shows a hot data-processing loop ran far slower than its actual amount of useful data should have required, because a profiler traced the cause to each object in the array packing several large, unrelated fields alongside the one small field the loop actually read, wasting most of every fetched cache line. What practice would prevent this?
Restructuring the data as a struct-of-arrays, packing the field the loop actually needs contiguously by itself rather than interleaved with unrelated fields, means each cache-line fetch now delivers far more of the loop's actually-needed data per fetch, which is exactly the fix for the slowdown described in this incident. Continuing to pack large, unrelated fields alongside the loop's needed field in an array-of-structs layout is exactly what wasted most of every fetched cache line on data the loop never touched. This struct-of-arrays restructuring is a standard, well-known technique for exactly this kind of hot-loop, partial-field-access performance problem.
5 / 5
During a PR review, a teammate asks why the team restructures data into a struct-of-arrays layout for a hot loop instead of just leaving the more conventional array-of-structs layout and trusting the CPU's cache to handle it efficiently regardless. What is the reasoning?
The CPU always fetches a whole fixed-size cache line regardless of how much of it any given loop actually needs, with no way to selectively fetch only the field being read, so packing large, unrelated fields alongside the one the loop actually needs wastes most of every single fetch on data that's never touched. A struct-of-arrays layout instead packs the loop's needed field contiguously with only its own kind, so nearly every byte of every fetched cache line is data the loop genuinely uses, dramatically reducing the total number of fetches needed. The tradeoff is that a struct-of-arrays layout can make some other access patterns, like reading every field of one single object at once, less convenient or slightly less efficient in exchange for this specific hot-loop win.