views:

193

answers:

6

Under Linux what would be the best way for a program to restart itself on a crash by catching the exception in a crashhandler (for example on a segfault)?

A: 

Processes can't restart themselves, but you could use a utility like crontab(1) to schedule a script to check if the process is still alive at regular intervals.

Shaggy Frog
Nothing prevents a program from calling `exec` on `argv[0]` (almost always its own executable)...
dmckee
Hard to do that if it's already crashed.
Shaggy Frog
The "crash" is the OS sending the signal (with default behavior "terminate process"). The default behavior can be replaced with a user defined function...
dmckee
What if memory is so corrupt that you get a second SEGV in the SEGV handler? Essentially it's more reliable to operate from another process.
Darron
+6  A: 

simplest is

while [ 1 ]; do ./program && break; done

basically, you run program until it is return 0, then you break.

aaa
This solution and [llasram's](http://stackoverflow.com/questions/3703227/self-restart-program-on-segfault-under-linux/3703270#3703270) can make it difficult to kill the process intentionally (at a minimum the user most know what is going on...). This can be good or bad depending on the intended use.
dmckee
@dmc of course, simplest is not necessarily good. it's hard to say without knowing more about requirements. I like your answer because it is more robust.
aaa
Oh, I'm not complaining. This has the virtue of being bog-simple and sometime you *don't* want ignorant users killing it off...
dmckee
@dmckee - [llasram's answer](http://stackoverflow.com/questions/3703227/self-restart-program-on-segfault-under-linux/3703270#3703270) has the property that the parent can catch `SIGTERM`, signal the child, wait, and then exit in an orderly fashion. This answer should be adaptable to do the same -- bash's job control should be rich enough to support it.
bstpierre
+5  A: 

You can have a loop in which you essentially fork(), do the real work in the child, and just wait on the child and check its exit status in the parent. You can also use a system which monitors and restarts programs in a similar fashion, such as daemontools, runit, etc.

llasram
A: 

The program itself obviously shouldn't check whether it is running or not running :)

Most enterprise solutions are actually just fancy ways of grepping the output from ps() for a given string, and performing an action in the event that certain criteria are satisfied - i.e. if your process is not found, then call the start script.

Ciarán
+5  A: 

SIGSEGV can be caught (see man 3 signal or man 2 sigaction), and the program can call one of the exec family of function on itself in order to restart. Similarly for most runtime crashes (SIGFPE, SIGILL, SIGBUS, SIGSYS, ...).

I'd think a bit before doing this, though. It is a rather unusual strategy for a unix program, and you may surprise your users (not necessarily in a pleasant way, either).

In any case, be sure to not auto-restart on SIGTERM if there are any resources you want to clean up before dying, otherwise angry users will use SIGKILL and you'll leave a mess.

dmckee
Not a good idea, from the signal manpage: "According to POSIX, the behavior of a process is undefined after it ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(2) or raise(3)."
Paul Rubel
@Paul: I hadn't noticed that before. The thing that is not clear to me is does running a handler that invokes exec on some static variable into which you have copied argv[0] constitute "ignoring" the signal. My instinct is to claim that it does not. In any case I have been able to reliably handle SIGSEGV on Mac OS and linux. I can't recall handling SIGFPE, and I don't think I have every generated SIGILL or SIGBUS. Certainly the other suggestions here are good and accomplish what the OPs desires, but I took the title literally.
dmckee
+2  A: 

As a complement to what was proposed here:

Another option is to do like it is done for getty daemon. Please see /etc/inittab and appropriate inittab(5) man page. It seems it is most system-wide mean ;-).

It could look like file fragment below. Obvious advantage this mean is pretty standard and it allows to control your daemon through run levels.

# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6
Roman Nikitchenko
That is the proper way to do it.
kofucii