views:

155

answers:

3

I have to run a legacy Zope2 website and have some grievance with it. The biggest issue is that, occasionally, it just locks up, running at 100% CPU load and not answering to requests anymore. While the problem isn't reproducible on a regular basis, one page containing 3 dynamic graphs triggers it sometimes, so I suspect some kind of race condition that leads to an endless loop or a stuck busywait.

The problem is, I have not yet found a way to debug this thing. There's nothing in the Zope logs and nothing in the system logs. I tried the suggestions from this question to get a stacktrace, but the only signal that has any effect is SIGKILL.

Is there another possibility to find out where exactly the process is when it gets stuck?

+1  A: 

You could try to attach a debugger to the running process. See also this question.

Thomas
+1  A: 

If the process is stuck in a way that no other signal gets through, you might want to consider running it from a debugger, instead of trying to attach to it at runtime.

Also, it might be useful to other debugging tactics, like turning off certain parts of the code to find out the minimal case in which it is still reproducible in order to see what causes it better.

abyx
+2  A: 

See my answer to this SO question, use Products.signalstack. It registers the same handler as the answer you already found, at Product registration time. Perhaps it works better for you.

If not, you probably have a OS-level I/O problem on your hands, and your only hope is attaching gdb to the process. Search Stack Overflow for gdb answers; there is a wealth of information here!

Martijn Pieters
+1 Also **pstack** and **lsstack** might be of some use.
Mike Dunlavey