Build fluency in the vocabulary of keeping a thread's memory local to the CPU socket that's actually using it.
0 / 5 completed
1 / 5
At standup, a dev mentions that on a multi-socket server, each CPU can access memory physically attached to its own socket faster than memory attached to a different socket, and writing code that accounts for this difference is called what?
NUMA awareness, for non-uniform memory access, is exactly this: on a multi-socket server, each CPU can reach memory physically attached to its own socket faster than memory attached to a different socket, and NUMA-aware code deliberately allocates memory and schedules work to keep a thread's data on the socket local to the CPU running it. A hash collision is an unrelated hash-table concept about two keys sharing a bucket. This local-versus-remote memory-latency difference is exactly why NUMA awareness matters for performance-sensitive workloads on multi-socket hardware.
2 / 5
During a design review, the team pins each worker thread to a specific CPU socket and allocates that thread's memory from the same socket, specifically to avoid the extra latency of reaching memory attached to a different socket. Which capability does this provide?
Pinning threads to a socket and allocating their memory locally provides lower and more predictable memory-access latency, since keeping a thread's data on the same socket as the CPU running it avoids the extra latency of crossing an interconnect to reach memory attached to a different, remote socket. Memory-access latency being the same regardless of which socket data lives on would contradict the entire premise of non-uniform memory access that gives NUMA its name. This local-versus-remote latency gap is exactly why NUMA-aware pinning and allocation improve performance on multi-socket hardware.
3 / 5
In a code review, a dev notices a performance-sensitive service on multi-socket hardware lets the operating system freely migrate threads between sockets and allocates memory with no regard for which socket will actually run the thread using it. What does this represent?
This is a missed opportunity for NUMA awareness, since letting threads freely migrate between sockets while allocating memory with no locality consideration risks a thread frequently ending up on one socket while its data lives on another, paying the extra cross-socket latency on every access, when pinning threads and allocating memory locally would avoid that cost. A cache eviction policy is an unrelated concept about discarded cache entries. This ignore-memory-locality pattern is exactly the kind of avoidable latency a reviewer flags on multi-socket hardware.
4 / 5
An incident report shows a performance-sensitive service's memory-access latency was inconsistent and sometimes much higher than expected on multi-socket hardware, because the operating system freely migrated threads between sockets and memory was allocated with no locality consideration. What practice would prevent this?
Adopting NUMA awareness, pinning each thread to a specific socket and allocating its memory locally, keeps a thread's data on the same socket as the CPU running it, which directly addresses the inconsistent, sometimes-high latency described in this incident. Continuing to let threads migrate freely with no locality consideration regardless of the resulting latency variance is exactly what caused the inconsistency. This pinning-and-local-allocation approach is the standard fix for latency variance traced back to cross-socket memory access on multi-socket hardware.
5 / 5
During a PR review, a teammate asks why the team pins threads to specific sockets instead of just letting the operating system's scheduler freely balance load across every socket however it sees fit. What is the reasoning?
Free load balancing can move a thread to whichever socket has spare capacity without regard for where that thread's data actually lives, incurring the extra latency of remote-memory access whenever the thread ends up away from its data, while pinning threads to a socket trades away some of that scheduling flexibility in exchange for consistently local, faster memory access. The tradeoff is that pinning can leave a socket underutilized if its pinned threads are idle while another socket is overloaded. This is exactly why NUMA-aware pinning is favored for latency-sensitive workloads, while free load balancing remains simpler for workloads where memory locality matters less.