tags:

views:

995

answers:

5

Duplicate:

Linux API to list running processes?

How can I detect hung processes in Linux using C?

+3  A: 

Under linux the way to do this is by examining the contents of /proc/[PID]/* a good one-stop location would be /proc/*/status. Its first two lines are:

Name: [program name] State: R (running)

Of course, detecting hung processes is an entirely separate issue.

/proc//stat is a more machine-readable format of the same info as /proc//status, and is, in fact, what the ps(1) command reads to produce its output.

caskey
ok thanx I changed the question. I've understood that, we can't detect if process is hung from its state :).
systemsfault
+2  A: 

Seeing as the question has changed:

http://procps.sourceforge.net/

Is the source of ps and other process tools. They do indeed use proc (indicating it is probably the conventional and best way to read process information). Their source is quite readable. The file

/procps-3.2.8/proc/readproc.c

You can also link your program to libproc, which sould be available in your repo (or already installed I would say) but you will need the "-dev" variation for the headers and what-not. Using this API you can read process information and status.

You can use the psState() function through libproc to check for things like

#define PS_RUN          1       /* process is running */
#define PS_STOP         2       /* process is stopped */
#define PS_LOST         3       /* process is lost to control (EAGAIN) */
#define PS_UNDEAD       4       /* process is terminated (zombie) */
#define PS_DEAD         5       /* process is terminated (core file) */
#define PS_IDLE         6       /* process has not been run */

In response to comment IIRC, unless your program is on the CPU and you can prod it from within the kernel with signals ... you can't really tell how responsive it is. Even then, after the trap a signal handler is called which may run fine in the state.

Best bet is to schedule another process on another core that can poke the process in some way while it is running (or in a loop, or non-responsive). But I could be wrong here, and it would be tricky.

Good Luck

Aiden Bell
Thanks Aiden, but how can I find if the process is non-responding. For instance will be the time that the process is idle + its state + cpu ram stats and building a heuristics on them be useful. If yes what might be that heuristics.
systemsfault
Unless the process can be prodded by the kernel (or signals) while it is on the CPU you can't tell if it is unresponsive IIRC
Aiden Bell
See edit - added some thoughts about hanging processes.
Aiden Bell
+1  A: 

Monitoring and/or killing a process is just a matter of system calls. I'd think the toughest part of your question would really be reliably determining that a process is "hung", rather than meerly very busy (or waiting for a temporary condition).

In the general case, I'd think this would be rather difficult. Even Windows asks for a decision from the user when it thinks a program might be "hung" (on my system it is often wrong about that, too).

However, if you have a specific program that likes to hang in a specific way, I'd think you ought to be able to reliably detect that.

T.E.D.
+1  A: 

You may be able to use whatever mechanism strace() uses to determine what system calls the process is making. Then, you could determine what system calls you end up in for things like pthread_mutex deadlocks, or whatever... You could then use a heuristic approach and just decide that if a process is hung on a lock system call for more than 30 seconds, it's deadlocked.

dicroce
+1  A: 

You can run 'strace -p ' on a process pid to determine what (if any) system calls it is making. If a process is not making any system calls but is using CPU time then it is either hung, or is running in a tight calculation loop inside userspace. You'd really need to know the expected behaviour of the individual program to know for sure. If it is not making system calls but is not using CPU, it could also just be idle or deadlocked.

The only bulletproof way to do this, is to modify the program being monitored to either send a 'ping' every so often to a 'watchdog' process, or to respond to a ping request when requested, eg, a socket connection where you can ask it "Are you Alive?" and get back "Yes". The program can be coded in such a way that it is unlikely to do the ping if it has gone off into the weeds somewhere and is not executing properly. I'm pretty sure this is how Windows knows a process is hung, because every Windows program has some sort of event queue where it processes a known set of APIs from the operating system.

Not necessarily a programmatic way, but one way to tell if a program is 'hung' is to break into it with gdb and pull a backtrace and see if it is stuck somewhere.

bdk
If I had a specific program to determine if it is hung or not,I'd try to change it state or signal this process, even you can try to fork. If it don't respond to any of the above. It is probably dead.
systemsfault