views:

303

answers:

4

On Solaris 10, I have a parent and child process. I kill the child process with kill -KILL. I want the fastest possible detection of this in the parent process (this is a master/slave system and the goal is for the parent to request its backup to take over as fast as possible). The parent process needs to know that the child has started to exit (it doesn't need to wait until the child has exited).

In the system I'm working with I see a delay of about 200ms between sending the SIGKILL and the parent process receiving the SIGCHLD. I don't think I can reduce this time, simply because of the size of the child process and the time it takes to exit - correct me if I am wrong.

I think my options are: -- Don't send SIGKILL to the child. Send a signal to the parent instead, so that it can kill the child (and therefore knows instantly that the child process is being terminated). This is not ideal because some of the "kill -KILL" commands are out of my control so I can't replace them with a different signal to the parent. -- Hook into the termination processing on the child (I don't think this is possible because SIGKILL can't be caught). -- Any other ideas?

Thanks for any advice. NickB

A: 

I'm not sure you're going to get much faster than the delivery of SIGCHLD. You may want to think about re-architecting the application to be a master/multi-slave one, if possible.

If you're running with one master and five slaves, then the loss of one slave will result in a 20% drop in capacity rather than total loss. And hopefully the master can get another slave up quickly enough before it's noticed.

Another possible advantage to this is to have spare slaves waiting in the wings, already started but waiting on a semaphore or other signal to start doing the real work. It's possible that this may help even if you can't run multiple slaves side-by-side since it will remove at least part of the delay (waiting for the process to load up). Simply signal a spare child to start as soon as the SIGCHLD appears.

paxdiablo
A: 

Rather than using signals to catch the child being killed, you could use waitpid() or waitid() to detect the change of state of the child process. You should be calling one of these in any case to reap the child's pid...

You can then ignore SIGCHLD, and have the added bonus of avoiding asynchronous coding.

paxdiablo's suggestion of using semaphores may also actually be what you want: On startup, a child locks a semaphore. If you run two child processes, then one will run and one will be waiting on the semaphore. Once the first is killed, the second starts running.

CuriousPanda
A: 

This is a guess, but how is the parent process detecting the SIGCHLD? If you're using a signal handler, you might be able to gain some speed by using a dedicated signal thread.

Basically, you start a separate thread to process the signal. All threads (including the signal thread) should call pthread_sigmask() to block receipt of SIGCHLD. The signal thread then calls sigwait() with a mask including SIGCHLD. sigwait() will block until a SIGCHLD is received, and then return when the signal is received.

The main advantage of using a signal thread is that you can process the signals in a main loop of some kind, without the limitations of a signal handler or having the signal interrupt something else the process may be doing. My wild guess is that might also be cheaper for the kernel to deliver a signal to a thread using this method.

Kenster
A: 

Hi,

you can use not so widely know feature of Solaris doors. In your parent process, create door by door_create with DOOR_UNREF attribute, which means:

Delivers a special invocation on the door when the number of descriptors that refer to this door drops to one.

Then fork, so you have two references to the door's descriptor . When your child process dies, a door function is called in the parent process, because the door's descriptor references drops to one.

Solaris doors are meant to be super fast, but honestly, I never measured a delivery time in this particular case. Let me know, if it works for you.

Peter Vrabel