views:

1839

answers:

10

I'm working on a client server application. At certain times, on certain machines, when there's more than 5 clients requesting data, it seems to reach a deadlock. If I step in to debug the problem, then the program appears to be processing. Simply setting a break point where I know the program is executing, and causing it to hit the breakpoint a few times causes it to finish. If I insert Thread.Sleep(0) at certain points in the code, mostly around some cpu intensive loops, it seems to pretty much totally resolve the problem. The one problem that I have now is that if I call Thread.Sleep(0) too much, it can slow down the code. If I don't call it enough, the code seems to be in deadlock. Although I can verify that it isn't in deadlock because if I step into the code, it causes the problem to disappear, simply because I'm pausing a thread.

Is there a good way to track down exactly what is causing this. It seems that it only happens on my laptop which is running Vista, but not on my desktop which is running Windows XP. However, it's impossible to debug, because simply stepping into the code causes the problem to go away. I've read comments that calling Thread.Sleep(0) is a bad practice, and shouldn't be necessary, and I don't like putting witchcraft type code into my applications that I don't understand why it has to be there. Any pointers would be greatly appreciated.

[EDIT] I can also verify that the code is still running when it is "deadlocked" because if I leave it long enough, it will finish, only the amount of time it takes is many orders of magnitude higher. I mean like it's actually at least 100 times slower when it's in this "deadlocked" mode. The CPU is pegged at 80-95%, so it is working, altough what it is doing is beyond me, because it's taking forever to complete the task.

[More Info] Just because everybody here is insistent that it's a deadlock, I removed all the code that did any locking. There was only a couple lines of code that did any locking whatsoever. The threads work completely independantly for the most part, so it wasn't much work to remove the locking completely. Still the problem persists. There is no more synclocks in my code, no more mutex's no more stuff that I see see that would cause a deadlock, but the problem is still there. And it's not deadlocked. It runs, albeit very slowly, even though it's eating up all the processor resources.

A: 

Try printing out a line everytime you lock a mutex and printing out another line when you unlock it. Remember to include the function name and the thread id when you print it. Should give you an idea of when it's locking. Seems like a race condition calling a sleep(0) still causes the CPU to use cycles processing the call to the function. Therefore causing an inherent sleep.

Suroot
+5  A: 

Thread.Sleep(0) is a yield. I am guessing this rearranges the way you call to avoid some issues. I think if you released code with the yield and ran it on 1000 machines, you would get a lot of bug reports. I am guessing you need some type of lock/critical section to avoid your dead lock because some where your code is not thread safe. That could be in a library you are calling.

  1. Add in logging and see if the problems still happens. Hopefully you can figure out what functions are causing the dead lock
  2. Add some critical sections. Uses a divide and conquer approach you should be able to narrow down where the problem is happening.
hacken
In other words, the `Sleep(0)` is probably masking a race condition.
SamB
+1  A: 

This is impossible to answer without looking at the entire codebase.

Calling Thread.Sleep is a HORRIBLE practice, and you shouldn't do it. You are basically shifting around the timing of your code which is leading to the deadlock, as opposed to actually addressing the deadlock condition to begin with.

The reason that debugging doesn't show the problem is that when the debugger stops, all of the threads in your code stop, so that is skewing the timing of the execution of your program as well.

What you want to do here is insert logging code here to trace the path of execution of your code on different threads, and based on that, determine where the deadlock is.

casperOne
A: 

I think it is time for you to step away from the computer and up to the whiteboard. Analyse how each element is locking and in what conditions it releases the lock carefully. Five threads might be a difficult problem, so maybe see if just two threads can cause the same condition. Something is not locking properly and you need to find where.

Unless your code is not worth this much effort, then leave the Thread.sleep() in there, because it doesnt really hurt your performance very much in the grand scheme of things.

Karl
+1  A: 

What is inside of that loops? Similar problems can appear if you will check some field in a loop to synchronize multiple threads:

while (_field); // waiting for _field change in another thread

This solution will work very slow and calls to Thread.Sleep(0) is not a solution, but can be a hack in some cases. This can be fixed properly if you change that synchronization loop by call to WaitHandle.WaitOne() method of some synchronization object (ManualResetEvent for example) and place a signal to this handle in another thread. Maybe your problem is something like that? Please provide some part of your code.

Dmitriy Matveev
A: 

You definately have threading issues that need to be addressed. Calling thread.Sleep(0) causes the scheduler to kick in. This probably gives each thread an opportunity to run enough to make things work. I wouldn't just leave the sleep in there and leave it at that because those are the type of things that work for a while and then a totally unrelated change ends up breaking things.

Dunk
A: 

Unless hardware forces you to, you should never use sleep().

Try thinking about the problem in a different way. Consider what data needs to be shared between threads and think about ways to send the data (IE, copy it) to the interested parties instead of sharing access.. If you do this right, you may not actually need any mutexes...

Remember, local variables exist on different stacks, but statics within functions are essentially globals (and of course, you need to look closely at your globals).

dicroce
You should never use sleep? Sleep(0) is useful for things like game loops where you want to use as much of the processor as you can, without blocking other system processes.
FryGuy
A: 

Maybe try using a tool like Typemock Racer?

Disclaimer: I've never used that tool before.

FryGuy
+1  A: 

How many simultaneous threads are running when it bogs down? If you have too many threads, the CPU could be spending all of its time doing context switching. Plus, you would be chewing through 1+ MB of memory per thread.

mbeckish
A: 

Yeah, you may have encountered a scheduler overload.

I sincerely hope not.

Joshua