How to know if a process had been started but crashed in Linux

views:

360

answers:

How to know if a process had been started but crashed in Linux

Consider the following situation: - I am using Linux. I have doubt that my application has crashed. I had not enabled core dump. There is no information in the log.

How can I be sure that, after the system restart my app was started, but now it is not running, because it has crashed.

My app is configured as a service, written in C/C++.

In a way: how can I get all the process/service names that have executed since the system start? Is it even possible?

I know, I can enable logging and start the process again to get the crash.

+1 A:

you probably can make a decoy, ie an application or shell script that is just a wrapper around the true application, but adds some logging like "Application started". Then you change the name of your original app, and give the original name to your decoy.

shodanex 2009-05-26 09:41:09

I don't know of a standard way of getting all the process names that have executed; there might be a way however to do this with SystemTap.

If you just want to monitor your process, I would recommend using waitid (man 2 wait) after the fork instead of detaching and daemonizing.

niXar 2009-05-26 09:43:42

+2 A:

I would recommend that you write the fact that you started out to some kind of log file, either a private one which get's overwritten on each start up or one via syslogd.

Also, you can log a timestamp heartbeat so that you know exactly when it crashed.

Robert S. Barnes 2009-05-26 09:44:00

+4 A:

Standard practice is to have a pid file for your daemon (/var/run/$NAME.pid), in which you can find its process id without having to parse the process tree manually. You can then either check the state of that process, or make your daemon respond to a signal (usually SIGHUP), and report its status. It's a good idea to make sure that this pid still belongs to your process too, and the easiest way is to check /proc/$PID/cmdline.

Addendum: If you're only using newer fedora or ubuntu, your init system is upstart, which has monitoring and triggering capabilities built in.

As @emg-2 noted, BSD process accounting is available, but I don't think it's the correct approach for this situation.

JimB 2009-05-26 13:32:15

Also remember that process ID roll over, so when checking for the active PID make sure it really is the app running. One way to do that is to look at /proc/PID/cmdline

Brian C. Lane 2009-05-26 15:32:20

Good point Brian - I was assuming a certain level of safe programming practices ;) I'll add it for thoroughness.

JimB 2009-05-26 15:39:39

+6 A:

This feature is included in Linux Kernel. It's called: BSD process accounting.

2009-05-26 13:39:06

Also consider using atop. These two in combination should cover everything, with most things in more than one way.

Autocracy 2009-05-26 13:41:00

+1 A:

As JimB mentions, you have the daemon write a PID file. You can tell if it's running or not by sending it a signal 0, via either the kill(2) system call or the kill(1) program. The return status will tell you whether or not the process with that PID exists.

Curt Sampson 2009-05-26 13:49:55

+1 A:

Daemons should always: 1) Write the currently running instance's process to /var/run/$NAME.pid using getpid() (man getpid) or an equivalent command for your language. 2) Write a standard logfile to /var/log/$NAME.log (larger logfiles should be broken up into .0.log for currently running logs along with .X.log.gz for other logs, where X is a number with lower being more recent) 3) /Should/ have an LSB compatible run script accepting at least the start stop status and restart flags. Status could be used to check whether the daemon is running.

PhrkOnLsh 2009-05-26 15:06:12

If your app has crashed, that's not distinguishable from "your app was never started", unless your app writes in the system log. syslog(3) is your friend.

To find your app you can try a number of ideas:

Look in the /proc filesystem
Run the ps command
Try killall appname -0 and check the return code

Norman Ramsey 2009-05-27 02:47:58

ansaurus

tags:

views:

answers:

How to know if a process had been started but crashed in Linux

related questions