views:

138

answers:

7

Hi,

I was hoping to get some good ideas as to what might be causing a really nasty bug.

This is a program which is transmitting data over a socket, and also receives messages back. I could explain lots more, but I don't think this will help here.

I'm just searching for hypothetical problems which can cause the following behaviour:

  • program runs
  • processor time slowly accumulates (till around 60%)
  • all of a sudden (could be after 30 but also after 60 seconds) the processor time shoots to 100%. the program halts completely
  • In my syslog it always ends on one line with a memory allocation (something similar to: myArray = new byte[16384]) in the same thread.

now here is the weird part: if I set the debugger anywhere...it immediately stops on that line. So just the act of setting a breakpoint, made the thread continue (it wasn't running since I saw no log output anymore)

I was thinking 'deadlock' but that would not cause 100% processor power. If anything, the opposite. Also, setting a breakpoint would not cause a deadlock to end.

anyone else a theoretical suggestion as to what kind of 'construct' might cause this effect? (apart from 'bad programming') ;^)

thanks

EDIT: I just noticed.... by setting the sendspeed slower, the problem shows itself much later than expected. I would think around the same amount of packets send...but no the amount of packets send is much higher this way before it has the same problem.

+3  A: 

I can only guess, but the opposite of a deadlock would be a livelock. This means two threads who react to each other in an infinite loop. This could also be possibly interrupted by setting a break point, as livelocks generally depend on the right timing.

Other than this I had once a similar issue with the Java nio classes which are non-blocking which caused the main thread to busy wait for input. Although the CPU usage rose instantaneously, not just after a few seconds.

Maybe if you can provide a bit more information like the programming language or even a code sample there might be more ideas.

frenetisch applaudierend
The program is quite big... so it has to be a purely hypothetical speculative session. The programming language is C# under windows. I'm going to read up on livelocks... thanks for the tip
Toad
+1 for probably is thread locks
bguiz
+2  A: 

Anything that involves repetitive processing (looping, recursion, etc) can cause this.

What's interesting is that if the program is doing anything that normally slows down performance (such as disk IO or network access), then the processor is less likely to peg . The processor pegs at 100% only if the program is using the processor. If you have to wait for disk or network IO, then the processor thread has to wait.

So in the code, I'd check for loops where a lot of work is going on, but little IO.

Also, if you're debugging in Visual Studio, you can hit the pause button to stop the app at the current point and see what your code is doing when it locks.

David Stratton
unfortunately it always breaks in the gui which is doing nothing but showing some stats in a timer.... which is weird now I think about it, since the timer only fires every 5 seconds... yet when I do break-all it always is in the timer code which only prints a few values.... I'm going to look at that more closely
Toad
ok..disabled stats timer.... still hangs.
Toad
Could you manage to run the network code seperately from the GUI? This way you could see if it is the code behind or the UI which causes the bug.Also you should watch out if there is some code in the event handling thread that blocks. This may also cause all sort of weird behaviour.
frenetisch applaudierend
A: 

Without seeing code, I only can say your program is probably infinite looping and the call that should block is not blocking correctly as you're expecting

rossoft
+1  A: 

I'm guessing an infinite loop in the socket receiving end. It keeps trying to allocate a buffer to receive the data that is coming in, but the buffer is never big enough so it keeps allocating. But it is really hard to say without code. I'd advise you to add more logging and/or single step the code if you don't want to share it.

DaMacc
the weird part is that the moment I break...I can single step through it no problem. I can even continue the program. And depending on the amount of singlesteps I do, it runs longer before hanging again. This sounds a bit like thread starvation
Toad
A: 

You can also try profiling (EQUATEC free profiler, for example). If will show you how much of your processor time was spent in each method.

Groo
A: 

I found the answer... quite silly actually (it always is). The thread which is sending/receiving messages is doing this via asynchronous methods. However, the asynchronous callbacks never seem to be able to come through while the thread is also pumping messages in the sendqueue. I notice when I put a thread.sleep every second, all asynchronous callbacks are pumped through. So the solution it turns out is to have a separate thread for sending/receiving, done purely on async, and another one for filling the sendqueue.

why this would have resulted in 100% processor power is beyond me. But it does actually explain why setting a breakpoint allowed the async callbacks to catch up.

Toad
I've +1 some answers which (although not in the particular case) do prove helpful and where good advice
Toad
A: 

Because the program fails while allocating memory I would guess that the incoming message rate is too high for it to handle.

I imagine that your program has some thread that it's only job is to listen to the socket and send the incoming messages to some other threads to handle (maybe you have some thread pool there). Imagine a situation where the incoming message rate is too high so all the worker threads are busy handling previous messages and the thread that listen to the socket have to put the new messages into some kind of queue until one of the worker threads will be free to handle them. this queue will grow and grow until you won't have additional memory. so that could be the reason for your program's termination.

now, about the 100% CPU. I guess that the thread the uses the CPU must be one of the worker threads. this will explain why the listening thread is queuing the messages. the reason can be a corrupted message or something else that causes it to run into an infinite loop. "frenetisch applaudierend" suggested in his answer that two or more of the worker threads can cause "livelock" on each other which could also be the reason for your problem.

Moshe Levi