views:

368

answers:

3

How to determine exactly what a piece of software is doing when it is stuck, unresponsive to user input and not updating its display?

I have tried oprofile, which records what function is executing, but it's not giving me enough clues. It counts everything that happens during the time it's running, when I need to see what's happening only when the specimen program is stuck.

The problem might involve interrupts, waiting on network sockets, timers, a GUI event handler, or who knows what. How to find out as much as possible about what's going on, not just the execution points of each thread?

The soffware of interest runs on Linux, built using gcc, mostly C++ but may involve other languages including interpreted ones e.g. Python.

The particular case of concern now is Firefox, for which I have checked out source. Firefox pauses all input and screen output at random times, frequently, for about 5-10 seconds each time. Even if someone handed me the solution to this particular problem on a silver platter, sure I'll take it but still be asking. If possible, I'd like to learn general techniques that would apply to any software, especially stuff I'm responsible for.

+2  A: 

strace will trace out the system calls. This might give some indication of what is blocking on network sockets and so on.

Jeff Foster
I've used strace for programs that crash. Is it useful for a running one which I don't want to kill?
DarenW
Use strace to start the application up, it'll intercept and record the system calls and shove them out to stdout (or with -o to a file),It'll run until you terminate it (assuming the program you're running doesn't crash). It should therefore be fine for a running one which you don't need to kill.
Jeff Foster
+2  A: 

This technique should find it. Basically, while it's spending time like that, there's almost always a hierarchy of function calls on the stack waiting for their work to be completed. Just sample the stack a few times and you'll see them.

ADDED: As Don Wakefield pointed out, the pstack utility could be perfect for this job.

Mike Dunlavey
And if you're on Linux, you don't even need the debugger. Just use [pstack](http://linux.die.net/man/1/pstack)
Don Wakefield
@Don: Thanks for the tip. It doesn't seem to show you source lines, but it still gets the job done.
Mike Dunlavey
Aw man, i gotta stop living in a cave! Haven't heard of this pstack til now... it may do the job.
DarenW
@DarenW: Me neither. My cave is cold. Can you email me some of that Florida air (without the cat fur)?
Mike Dunlavey
There's lsstack for Linux but unfortunately compiles and runs only on 32 bit; i'm running 64 bit.
DarenW
This has been a useful direction to pursue, but I'd like formulate what I've found as a fresh standalone answer. As for this Florida air, right now it's rather cool and rainy. You don't want this - unless you are in a deser?
DarenW
+1  A: 

A stack trace can be obtained of a running program. At a command line, use "ps aux" to find the program's PID. Suppose it's 12345. Then run:

gdb ---pid=12345

When the program is stuck in a pause (or when doing anything suspicious), do a ctrl-C in gdb. The "bt" command in gdb prints the stack, which can be admired now or pasted into a text file for later study. Resume execution of the program with "c" (continue).

The main advantage of this manual technique over using oprofile or other profilers, is I can get the exact call sequence during a moment of interest. A few samples during times of trouble, and a few when the program is running normally, should give useful clues.

DarenW
I've tried the ctrl-C in **gdb** method under Windows, and not had luck. It seems to get to a place where there's no real stack. I wonder what I'm doing wrong.
Mike Dunlavey
... I've made myself a pain in the xxx by explaining over and over why that technique works so well, such as: http://stackoverflow.com/questions/406760/whats-your-most-controversial-programming-opinion/1562802#1562802
Mike Dunlavey
... This example shows a 40x speedup: http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773
Mike Dunlavey