Learn the vocabulary of a bug that changes behavior or disappears the moment someone tries to observe it.
0 / 5 completed
1 / 5
At standup, a dev mentions a bug that reliably occurs in production but mysteriously disappears the moment a developer attaches a debugger or adds logging to try to observe it, apparently because the act of observing it changes its timing enough to hide it. What is this kind of bug called?
A heisenbug is exactly this: it is a bug that changes its behavior or disappears entirely the moment someone attempts to observe or debug it, such as by attaching a debugger or adding logging, often because the added instrumentation itself changes the program's timing enough to hide the underlying race condition or timing sensitivity that caused it. A hash collision is an unrelated hash-table concept about two keys sharing a bucket. This disappears-the-moment-you-try-to-observe-it pattern is exactly why a heisenbug is notoriously difficult to reproduce and diagnose in a debugger.
2 / 5
During a design review, the team suspects a heisenbug caused by a race condition and switches to a lower-overhead tracing approach that doesn't alter thread scheduling timing, specifically because a full debugger attachment had been changing the timing enough to make the bug vanish during every debugging attempt. Which capability does this provide?
Low-overhead tracing here provides observation without disturbing the timing that triggers the bug, since low-overhead tracing avoids altering thread scheduling significantly, instead of a full debugger attachment changing timing enough to hide the very race condition being investigated. Attaching a full debugger to investigate a suspected race condition often changes the program's timing enough that the race condition, and the heisenbug it causes, simply doesn't occur while being watched. This observe-without-disturbing-timing behavior is exactly why low-overhead tracing techniques are preferred when a heisenbug is suspected to be timing-sensitive.
3 / 5
In a code review, a dev notices a team has been trying to reproduce an intermittent production failure exclusively by attaching a full debugger and stepping through the suspected code path, and the failure has never once occurred during any of those debugging sessions despite happening regularly in production. What does this represent?
This is a heisenbug being investigated with a technique likely to hide it, since attaching a full debugger changes the program's timing enough to mask the very race condition causing the intermittent failure. A cache eviction policy is an unrelated concept about discarded cache entries. This never-reproduces-under-a-debugger pattern is exactly the signal a reviewer flags once a bug is suspected to be timing-sensitive rather than a straightforward, deterministic logic error.
4 / 5
An incident report shows an intermittent production failure went unresolved for months, because the team repeatedly attached a full debugger to try to reproduce it, and the debugger's own timing changes kept masking the underlying race condition every single time, without anyone recognizing it as a heisenbug. What practice would prevent this?
Recognizing the failure as a likely heisenbug and switching to low-overhead tracing or logging that doesn't significantly disturb thread timing avoids relying on a full debugger that keeps masking the race condition. Continuing to attach a full debugger to try to reproduce the intermittent failure regardless of how many months pass without it ever reproducing under observation is exactly what caused the unresolved incident described here. This switch-to-low-overhead-observation approach is the standard fix once a bug is confirmed to be a timing-sensitive heisenbug that a full debugger keeps masking.
5 / 5
During a PR review, a teammate asks why the team switches to low-overhead tracing for a suspected heisenbug instead of simply attaching a full debugger for longer and stepping through more carefully. What is the reasoning?
Low-overhead tracing avoids significantly disturbing the program's timing, giving a realistic chance of observing the race condition as it actually occurs, while attaching a full debugger for longer still changes the underlying timing on every attempt, meaning the race condition, and the heisenbug it causes, may simply never occur while a full debugger is attached no matter how long or carefully anyone steps through the code. This is exactly why low-overhead tracing is preferred over a full debugger when a heisenbug is suspected to be timing-sensitive.