views:

98

answers:

4

I want to terminate a process group by sending SIGTERM to processes within it. This can be accomplished via the kill command, but the manuals I found provide few details about how exactly it works:

   int kill(pid_t pid, int sig);
   ...
   If pid is less than -1, then sig is sent to every  process  in
   the process group whose ID is -pid.

However, in which order will the signal be sent to the processes that form the group? Imagine the following situation: a pipe is set between master and slave processes in the group. If slave is killed during processing kill(-pid), while the master is still not, the master might report this as an internal failure (upon receiving notification that the child is dead). However, I want all processes to understand that such termination was caused by something external to their process group.

How can I avoid this confusion? Should I be doing something more than mere kill(-pid,SIGTERM)? Or it is resolved by underlying properties of the OS, about which I'm not aware?

Note that I can't modify the code of the processes in the group!

+1  A: 

My understanding is that you cannot rely on any specific order of signal delivery.

You could avoid the issue if you send the TERM signal to the master process only, and then have the master kill its children.

Marius Gedminas
I can't "have" master do anything. Assume its source is unavailable. Actually, *this* is why my question appeared. :-)
Pavel Shved
@Pavel Shved - is it really a programming question then? As far as I know Marius is right but it might be worthwhile asking on ServerFault. Sysadmin types love these kinds of questions. :)
Duck
@Duck, it surely is a programming question, but, perhaps, Server Fault users know more about this, thank you for the pointer.
Pavel Shved
+3  A: 

Try doing it as a three-step process:

kill(-pid, SIGSTOP);
kill(-pid, SIGTERM);
kill(-pid, SIGCONT);

The first SIGSTOP should put all the processes into a stopped state. They cannot catch this signal, so this should stop the entire process group.

The SIGTERM will be queued for the process but I don't believe it will be delivered, since the processes are stopped (this is from memory, and I can't currently find a reference but I believe it is true).

The SIGCONT will start the processes again, allowing the SIGTERM to be delivered. If the slave gets the SIGCONT first, the master may still be stopped so it will not notice the slave going away. When the master gets the SIGCONT, it will be followed by the SIGTERM, terminating it.

I don't know if this will actually work, and it may be implementation dependent on when all the signals are actually delivered (including the SIGCHLD to the master process), but it may be worth a try.

camh
Yeah, this solution is what I have currently implemented. I checked that `SIGTERM` handler on my system is indeed not invoked until the processes wake up due to `SIGCONT`. My experiments also show that all proper `SIGCHLD` signals are delivered to the controlling process: at stopping, resuming and terminating of the child we send signals to.
Pavel Shved
check `ps` output or `/proc/<pid>/stat` to check process state, which indicates stopped state as well (by a `T` in `ps` output according to man page)
mvds
A: 

Untested: Use shared memory and put in some kind of "we're dying" semaphore, which may be checked before I/O errors are treated as real errors. mmap() with MAP_ANONYMOUS|MAP_SHARED and make sure it survives your way of fork()ing processes.

Oh, and be sure to use the volatile keyword or your semaphore is optimized away.

mvds
and again, I can not modify any processes. Life would be much easier if I could... Edited the question to reflect this.
Pavel Shved
+1  A: 

Even if all the various varieties of UNIX would promise to deliver the signals in a particular order, the scheduler might still decide to run the critical child process code before the parent code.

Even your STOP/TERM/CONT sequence will be vulnerable to this.

I'm afraid you may need something more complicated. Perhaps the child process could catch the SIGTERM and then loop until its parent exits before it exits itself? Be sure and add a timeout if you do this.

Darron
I can't change the code of thr processes I kill. Sorry, edited the question.
Pavel Shved