Vocabulary for Race Conditions and Concurrency Bugs
Learn the essential English vocabulary for diagnosing and explaining race conditions, deadlocks, and other concurrency bugs to your team.
Concurrency bugs are notoriously hard to reproduce and even harder to explain clearly, which makes the vocabulary around them unusually important. Saying “it’s flaky” or “it happens sometimes” doesn’t help a teammate reason about the problem — naming the specific mechanism (race condition, deadlock, starvation) points directly at the class of fix needed.
Foundational Concepts
1. Race condition
A situation where the correctness of a program depends on the relative timing of two or more operations, producing different, sometimes incorrect, results depending on which operation happens first.
Usage: “This is a classic race condition — two requests can read the same balance before either writes back an update, so one update silently overwrites the other.”
2. Critical section
A part of code that accesses shared state and must not be executed by more than one thread or process at the same time without proper coordination.
Usage: “This block reads and modifies the shared counter without any locking around it — it needs to be treated as a critical section.”
3. Mutual exclusion (mutex)
A synchronization mechanism that ensures only one thread can execute a critical section at a time, typically implemented through a lock object.
Usage: “We wrapped the critical section in a mutex so only one worker can update the shared cache entry at a time.”
4. Deadlock
A state where two or more threads are each waiting on a resource held by another, such that none of them can ever proceed.
Usage: “We hit a deadlock because thread A was holding lock 1 and waiting on lock 2, while thread B held lock 2 and waited on lock 1.”
5. Livelock
A situation where threads keep changing state in response to each other without making actual progress, distinct from a deadlock because the threads aren’t technically blocked.
Usage: “Both retry loops kept backing off for each other’s benefit and neither ever proceeded — that’s a livelock, not a deadlock, since neither thread was actually stuck waiting.”
Detection and Diagnosis
6. Nondeterminism
The property of a program producing different results across runs given the same input, often the underlying cause of a bug being hard to reproduce reliably.
Usage: “This test’s nondeterminism comes from an unordered iteration over a set of concurrent workers — the output order isn’t guaranteed.”
7. Reproducibility (of a concurrency bug)
How reliably a bug can be triggered on demand, often deliberately low for race conditions since the failure depends on precise, hard-to-control timing.
Usage: “Reproducibility on this bug is maybe one in twenty runs locally, but it happens almost every time under production load, which points to a timing-sensitive race.”
8. Data race
A specific kind of race condition where two threads access the same memory location concurrently, at least one of them writing, without proper synchronization.
Usage: “The thread sanitizer flagged a data race on this variable — two goroutines are writing to it without a lock in between.”
9. Atomicity
The property of an operation completing entirely or not at all, with no observable intermediate state visible to other threads.
Usage: “This increment isn’t atomic — it’s actually a read, an add, and a write, and another thread can interleave in the middle of those three steps.”
10. Happens-before relationship
A formal ordering guarantee between two operations across threads, used to reason about whether one operation’s effects are guaranteed to be visible before another begins.
Usage: “Without an explicit happens-before relationship established by a lock or a channel operation, there’s no guarantee the other thread sees this write at all.”
Common Fixes and Patterns
11. Locking (lock contention)
The practice of protecting shared state with a lock, and the performance cost — lock contention — that occurs when many threads compete for the same lock.
Usage: “We’re seeing high lock contention on this single global lock — splitting it into per-shard locks should reduce the number of threads waiting on each other.”
12. Optimistic concurrency control
A strategy that allows operations to proceed without locking, then checks at commit time whether a conflicting change occurred, retrying if so.
Usage: “Instead of locking the row for the whole transaction, we’re using optimistic concurrency control with a version column, and retrying if the version has changed.”
13. Idempotency
The property of an operation producing the same result no matter how many times it’s applied, which makes retrying a failed or duplicated operation safe under concurrency.
Usage: “We made this endpoint idempotent using a request ID, so a retried request from a flaky network doesn’t double-charge the customer.”
14. Starvation
A condition where a thread is perpetually denied access to a resource it needs, even though it isn’t deadlocked, because other threads are repeatedly prioritized ahead of it.
Usage: “This low-priority worker is experiencing starvation — higher-priority tasks keep jumping the queue, so it never actually gets scheduled.”
15. Compare-and-swap (CAS)
An atomic, hardware-supported operation that updates a value only if it still matches an expected previous value, a common building block for lock-free concurrent code.
Usage: “We replaced the lock-based counter with a compare-and-swap loop, which avoids blocking entirely under low contention.”
Communicating About Concurrency Bugs
16. Thread safety
A description of code that behaves correctly when accessed by multiple threads simultaneously, without needing external synchronization from the caller.
Usage: “This class isn’t thread-safe — the documentation doesn’t say so explicitly, but it clearly assumes single-threaded access.”
17. Reentrancy
The property of a function that can be safely called again before an earlier call to the same function has completed, including from another thread or a recursive call.
Usage: “This handler isn’t reentrant — calling it again mid-execution corrupts its internal buffer, which is exactly what happened when the signal fired twice.”
18. Race window
The specific, narrow span of time during which a race condition can actually manifest, useful for explaining why a bug is rare but not impossible.
Usage: “The race window here is only a few milliseconds between the check and the write, which is why this bug shows up maybe once in ten thousand requests.”
19. Check-then-act
A common anti-pattern where code checks a condition and then acts on it in a separate step, leaving a window for another thread to change the condition in between.
Usage: “This is a classic check-then-act bug — we check if the file exists, then create it, but another process can create it in between those two steps.”
20. Torn read/write
A situation where a read or write of a value that should be atomic is instead observed or performed in a partially completed state, exposing an inconsistent intermediate value.
Usage: “We saw a torn read on this 64-bit counter on the 32-bit platform, since the two halves of the value can be read at slightly different moments.”
Key Takeaways
- Name the specific mechanism (race condition, deadlock, livelock, starvation) rather than saying a bug “happens sometimes” — precise naming points directly at the class of fix.
- Distinguish deadlock from livelock precisely — the former has threads genuinely stuck waiting, the latter has threads busy but making no progress.
- Reach for idempotency and optimistic concurrency control as concrete patterns, not just “add a lock,” since heavy locking introduces its own contention problems.
- Recognize check-then-act as a specific, common anti-pattern worth naming explicitly in code review rather than describing vaguely as “a timing issue.”
- Frame low reproducibility as evidence of a narrow race window, not as a reason to deprioritize the bug — timing-sensitive bugs often become far more frequent under production load.