views:

210

answers:

8
+9  Q: 

Unusual code bugs

I've been reading Wikipedia articles about unusual bugs. I'm particularly interested in Schrödinbugs and Heisenbugs:

Schrödinbug

A schrödinbug is a bug that manifests only after someone reading source code or using the program in an unusual way notices that it never should have worked in the first place, at which point the program promptly stops working for everybody until fixed. The Jargon File adds: "Though... this sounds impossible, it happens; some programs have harbored latent schrödinbugs for years."

Heisenbug

A heisenbug (named after the Heisenberg Uncertainty Principle) is a computer bug that disappears or alters its characteristics when an attempt is made to study it.

Has anyone encountered such a tricky bug in their code? Can you provide some real life examples?

+6  A: 

I've encountered many examples of the Heisenbug class. The most common occurance is a multi-threaded / timing issue that will stop reproducing as soon a a debugger is attached. Debuggers ever so slightly alter the timing of programs and this cause a particular race condition to appear or dissapear.

My particular favorite was a bug during the development of Visual Studio 2005. It involved a race condition between the C# code model, the native compiler, and project system. A background thread accessed a dead code model object which raised an exception that needed to be propagated back to the foreground thread via a windows message. When the foreground thread message queue was pumped determined if it caused a crash, dialog box or silent swallow.

When run on it's own a particular scenario caused the IDE but attaching the debugger caused the scenario to work perfectly. This made it a veritable nightmare to track down. We eventually discovered that the bug would still repro as long as the debugger was used to start Visual Studio (and not just attach afterwards).

JaredPar
Seems we all had the same.
Benoit
Yes indeed - on Windows, heap behaviour changes in the debugger unless manually forced to be 'Release-like'. Post-startup debugger attach can also mitigate this. Insertion of diagnostics to track a bug, however carefully built, can also change timing to make it 'disappear'.
Steve Townsend
+1  A: 

I had once a Heisenbug. The OS was releasing a handle in a bounded time, but always too late when running the program. When setting a breakpoint, the additional pause time would let sufficient amount of time for the resource to be released, which led to working code.

Benoit
+2  A: 

Timing bugs/race conditions are common Heisenbugs. As you add debugging or look at the program through a debugger, the timing of the software changes and the Heisenbugs disappear.

Starkey
+2  A: 

Last week at work, we had a threading-related bug that would appear only if we forced our program to run on one core outside of the debugger. If any attempt was made to start the program in the debugger, or attach the debugger to a running process (even with one core disabled), the bug would not appear.

We eventually solved the problem by extensive use of trace messages, but it was very mysterious, and a lot more painful than it should have been.

Martin Törnwall
+3  A: 

Regarding the Schrödinbug, I think the final paragraph is more realistic and helpful:

Repairing an obviously defective piece of code is often more important than determining what arcane set of circumstances caused it to work at all (or appear to work) in the first place, and why it then stopped. Because of this, many of these bugs are never fully understood. When bugs of this type are examined in enough detail, they can usually be reclassified as a bohrbug, heisenbug, or mandelbug.

Any time you decide to re-write/redo instead of refactor/evolve because the existing codebase is old/unstable and fraught with bugs, you (hopefully) end up with a superior app that, due to its better design, will often fix a number of bugs that were never fully understood.

Kirk Woll
+1 for expanding the taxonomy
Steve Townsend
+4  A: 

Heisenbugs aren't actually that rare. Often they are caused by timing/race conditions that change when a debugger is attached, and others talked about this. Sometimes, they can be caused by uninitialized local variables in C/C++ applications.

Someone forgets to init some variable or other memory (often used for a pointer), but the whole app usually happens to work just because the values found on the stack at its location are almost always "good enough" not to make the app crash.

Then, someone starts to look after the bug with a debugger: bang, the behavior starts to change: it crashes more often, or it never crashes, or something like that. This usually happens because, when running in debug mode, the the CRT/the OS allocator1 fill uninitialized the memory with special patterns in order to make easier to spot uninitialized variables. This makes the bug alter its effects, sometimes making it easier to track, in other occasions just making you think "nasty magic is going on here", because just attaching a debugger make the behavior change.

In general, having to deal with heisenbugs is not funny at all.


1. the tragic thing is that, on Windows+VC++, when you use the debug CRT (which usually means that you're compiling in Debug configuration), the CRT allocator fills memory with its magic pattern, and when you start an application with a debugger attached, the Windows debug heap is used. If you don't know this, things may get very creepy, since you can get several sets of "strange behavior" depending on the combinations of CRT/OS debug heap.

When you find out that starting the application from the IDE or starting it separately and connecting the debugger later yields different results... well, you have quite a situation. :)

Matteo Italia
I can't count the number of times I've had a manager yell at me "What did you do?" when I fixed an uninitialized pointer which just *happened* to work for several months on someone else's computer.
wheaties
@wheaties - hear hear. I've also had situations where it was the responsible developer yelling at me for tracking down and fixing their bug.
Steve Townsend
A: 

Heisenbug, I used to deal with those every day in several VB6 apps. debug.print statements would even cause a bug to stop happening. Thankfully 98% of that has been ported to php.

Echo
+2  A: 

It's only a matter of time before someone undertaking parallel programming encounters a Heisenbug -- in my experience they often arise as a result of a developer testing as they code, and finding out their program "works" (by luck) -- and forgetting to go back and finish/check their locking.

Have never encountered a Schrödinbug in the wild, closest would be similar behaviour resulting from a difference in production and development environments. i.e.: Seems to be working OK on the production system, someone uncovers code in question, and is only able to reproduce the bug on their development system.

As someone else has mentioned, the worst thing you can do is try to replicate the production environment/glitch to fix the bug -- best to just make it work as it is supposed to.

John Carter
Efforts to find how a piece of code "could have possibly worked" frequently turn out not to be very useful, because the ultimate result is to find out that a bad way of doing things (which one should never actually use) will sometimes work. On the other hand, sometimes such investigations reap great rewards, as they may reveal (and facilitate the elimination of) other bugs that had previously escaped notice. Unfortunately, it's often hard to know whether such investigations will prove useful without actually carrying them out.
supercat