views:

1751

answers:

3

Where can I find a well-respected reference that details the proper handling of PID files on Unix?

On Unix operating systems, it is common practice to “lock” a program (often a daemon) by use of a special lock file: the PID file.

This is a file in a predictable location, often ‘/var/run/foo.pid’. The program is supposed to check when it starts up whether the PID file exists and, if the file does exist, exit with an error. So it's a kind of advisory, collaborative locking mechanism.

The file contains a single line of text, being the numeric process ID (hence the name “PID file”) of the process that currently holds the lock; this allows an easy way to automate sending a signal to the process that holds the lock.

What I can't find is a good reference on expected or “best practice” behaviour for handling PID files. There are various nuances: how to actually lock the file (don't bother? use the kernel? what about platform incompatibilities?), handling stale locks (silently delete them? when to check?), when exactly to acquire and release the lock, and so forth.

Where can I find a respected, most-authoritative reference (ideally on the level of W. Richard Stevens) for this small topic?

+4  A: 

First off, on all modern UNIXes /varr/run does not persist across reboots.

The general method of handling the PID file is to create it during initialization and delete it from any exit, either normal or signal handler.

There are two canonical ways to atomically create/check for the file. The main one these days is to open it with the O_EXCL flag: if the file already exists, the call fails. The old way (mandatory on systems without O_EXCL) is to create it with a random name and link to it. The link will fail if the target exists.

Joshua
“There are two canonical ways to atomically create/check for the file.” That's exactly the kind of thing my question is about: where is this canon recorded canonically, and what makes it authoritative compared to conflicting advice from others?
bignose
Unfortunately much of UNIX operation methods is handed down in the culture. Reading the man pages for the system calls described in POSIX.1 (these are, confusingly enough, in man section 2) reveals only a few things that are suitable for locking. Since flock() isn't trusted this leaves only these two and one involving mkdir.
Joshua
A fair question is "Why isn't flock() trusted." The answer is there have been too many broken systems over the years, and it never works correctly over nfs anyway (the nfslock protocol itself is subject to the split mind problem).
Joshua
+1  A: 

Depending on the distribution, its actually the init script that handles the pidfile. It checks for existence at starting, removes when stopping, etc. I don't like doing it that way. I write my own init scripts and don't typically use the stanard init functions.

A well written program (daemon) will have some kind of configuration file saying where this pidfile (if any) should be written. It will also take care to establish signal handlers so that the PID file is cleaned up on normal, or abnormal exit, whenever a signal can be handled. The PID file then gives the init script the correct PID so it can be stopped.

Therefore, if the pidfile already exists when starting, its a very good indicator to the program that it previously crashed and should do some kind of recovery effort (if applicable). You kind of shoot that logic in the foot if you have the init script itself checking for the existence of the PID, or unlinking it.

As far as the name space, it should follow the program name. If you are starting 'food (foo daemon)' , it would be food.pid

You should also explore /var/lock/subsys, however that's used mostly on Red Hat flavors.

Tim Post
+4  A: 

As far as I know, PID files are a convention rather than something that you can find a respected, mostly authoritative source for. The closest I could find is this section of the Filesystem Hierarchy Standard.

This Perl library might be helpful, since it looks like the author has at least given thought to some issues than can arise.

I believe that files under /var/run are often handled by the distro maintainers rather than daemons' authors, since it's the distro maintainers' responsibility to make sure that all of the init scripts play nice together. I checked Debian's and Fedora's developer documentation and couldn't find any detailed guidelines, but you might be able to get more info on their developers' mailing lists.

Josh Kelley
Thanks. The consensus in other forums also seems to be that there's no canonical reference for this. (The FHS mentions the file location and content briefly, and says nothing about behaviour.)
bignose