views:

85

answers:

2

I've been fiddling around with CHESS, which seems like an incredibly useful tool. However, ironically, I seem to be dealing with a Heisenbug in one of my test methods. The results reported by CHESS when I run this test are unpredictable:

  • Sometimes the test will pass
  • Sometimes the test will fail, with no further description (simply: "Test failed")
  • Sometimes the test will fail with instructions to duplicate*
  • Sometimes the test will indicate "CHESS detected deadlock"

Initially, I thought this inconsistency must be due to the fact that the test involves the use of Random objects. It must have been that different seed values were yielding different outcomes, right?

So I updated the test to simply run for a predefined set of seed values (0 to 10). Thread-local Random objects get seeded by a (pseudo-)random value produced by a shared Random within a lock. The code looks basically exactly like this:

Screenshot of CHESS reporting a deadlock

(Update: I am running this on .NET 3.5, as CHESS only supports VS 2008. I wonder if the problem could have something to do with this?)

As I understand it, the above code should actually be pretty deterministic. Since sharedRandom is initialized with a known seed (between 0 and 10), the values produced by the localRandom object belonging to each thread running the code inside the Parallel.For call should be consistent from one test run to the next (which thread gets which seed from sharedRandom may differ between runs, but among the 5 iterations within Parallel.For, the same 5 seeds should be used for localRandom).

That's how I understand it. But from the CHESS results, I'm inclined to believe I must be missing something.

  1. Is there a deadlock in the above code that I'm too dumb to see?
  2. Should I not be using the Random class in concurrency-related tests?
  3. For those who have experience using CHESS: is it a reliable tool? Does it sometimes give false positives? This is actually a big one, as if it turns out that this scenario is common (inconsistent test results), then perhaps it'd be advisable for me to hold off on using CHESS at all for the time being.

*...which I haven't been able to figure out how to use -- but that's a separate issue.

+1  A: 

No answers, I'll give it a shot. It isn't obvious to me how the posted snippet could fail, I suspect the real problem is in the comment.

I don't have hands-on experience with CHESS but studied it well enough to know that you cannot rely on it to ever give you reproducible test results. It's approach to uncovering threading problems is very much statistical, injecting random delays in the threads. Designed to recreate the kind of threading problems that are so heavily influenced by timing, especially race conditions.

A race condition can go undetected for a very long time if code execution timing is predictable. And when it strikes, incredibly hard to diagnose. A good example of this is a large government project I heard of that shipped with the logging kept turned on. Because with it turned off it would no longer work and there was no good way to diagnose the problem without the logging info.

Threat CHESS as a diagnostic tool. If it raises a flag, you can be fairly sure that you have a real, but still hard to solve, threading problem.

Hans Passant
I figured the same as you (that surely the problematic code must be in the comment), but even when I get rid of everything except for a single call to `localRandom.Next();` CHESS still (occasionally) reports a deadlock. I will update the question to indicate this. Anyway, unless somebody comes along later today who can explain how this deadlock could occur, I'll probably end up accepting this answer because it does help me out quite a bit by clarifying how CHESS as a tool should be viewed.
Dan Tao
OK, I'm accepting this answer, as it is highly informative; but the question left lingering in my mind is: given that I have reproduced this problem with absolutely *nothing* besides the excerpted code (I removed all the code represented by the comment in the original question), how can there be a deadlock? Could it be simply that using `lock` is *inherently* vulnerable to deadlocks prior to .NET 4.0 (when `Monitor.Enter(object, ref bool)` was added)? That is, does CHESS maybe try aborting a thread, just to see what happens?
Dan Tao
Well, no, but I don't really see how you give CHESS a chance at detecting a problem with the given code minus the commented section either. It executes in a few microseconds, at best. It is the kind of tool you whip out when you have *real* code. Synthetic tests beget synthetic results, the tool was designed to deal with real world problems. And there's not a heckofalot of time skewing it can do when the code takes usecs.
Hans Passant
@Hans: But the problem is that there *was* "real" code in there, and CHESS reported a deadlock, and I couldn't figure out where the heck it could be coming from. When it reports a deadlock and I take out potential causes one by one until all that's left is the excerpt as it currently stands, I'm left scratching my head. As you put it yourself: "If it raises a flag, you can be fairly sure that you have a real ... threading problem" -- my concern is that, if it's reporting a deadlock even in this seemingly harmless case, how can I be sure there's a deadlock in the "real" code?
Dan Tao
+1  A: 

I certainly don't see a deadlock there. It's likely that Random has internal locking, but that should be fine.

You might want to try Jinx (www.corensic.com). Rather than producing a report, Jinx just alters the effective performance of various CPUs. So it can't really produce false positives.

If the tiny sample deadlocks under Jinx, it's definitely capable of deadlocking during normal use. Assuming it does deadlock, you should be able to break into the deadlock with Visual studio and see where the threads are.

Disclaimer. I work for corensic. And I don't think the small snippet you posted has real a deadlock. But I'm curious, so let us know what you find.

Dave Dunn
The `System.Random` class does **not** have any internal locking and is not guaranteed to be thread-safe.
Chris Shouts