I have some daemons that use PID files to prevent parallel execution of my program. I have set up a signal handler to trap SIGTERM and do the necessary clean-up including the PID file. This works great when I test using "kill -s SIGTERM #PID". However, when I reboot the server the PID files are still hanging around preventing start-up of the daemons. It is my understanding that SIGTERM is sent to all processes when a server is shutting down. Should I be trapping another signal (SIGINT, SIGQUIT?) in my daemon?
Not a direct solution but it might be a good idea to check for an actual process running with the pid in the pid file at startup and if none exists, to cleanup the stale file.
It's possible that your process is getting a SIGKILL before it has a chance to cleanup the pid file.
Remember that, after sending SIGTERM to all processes, the kernel wait some time (usually about 2 or 3 seconds), and then send SIGKILL. You can find that in /etc/rc.d/rc0.d/S01halt
or similar (might vary depending on your distribution).
For example, on my Fedora 11 you have:
action $"Sending all processes the TERM signal..." /sbin/killall5 -15
sleep 2
action $"Sending all processes the KILL signal..." /sbin/killall5 -9
So if you are not fast enough, either increase the delay, or make sure you are faster!
Use flock
(or lockf
) on your pidfile, if it succeeds, you can rewrite the pidfile and continue.
This SO answer has a good example on how this is done.