tags:

views:

2469

answers:

5

Imagine I have a process that starts several child processes. The parent needs to know when a child exits.

I can use waitpid, but then if/when the parent needs to exit I have no way of telling the thread that is blocked in waitpid to exit gracefully and join it. It's nice to have things clean up themselves, but it may not be that big of a deal.

I can use waitpid with WNOHANG, and then sleep for some arbitrary time to prevent a busy wait. However then I can only know if a child has exited every so often. In my case it may not be super critical that I know when a child exits right away, but I'd like to know ASAP...

I can use a signal handler for SIGCHLD, and in the signal handler do whatever I was going to do when a child exits, or send a message to a different thread to do some action. But using a signal handler obfuscates the flow of the code a little bit.

What I'd really like to do is use waitpid on some timeout, say 5 sec. Since exiting the process isn't a time critical operation, I can lazily signal the thread to exit, while still having it blocked in waitpid the rest of the time, always ready to react. Is there such a call in linux? Of the alternatives, which one is best?


EDIT:

Another method based on the replies would be to block SIGCHLD in all threads with pthread_sigmask(). Then in one thread, keep calling sigtimedwait() while looking for SIGCHLD. This means that I can time out on that call and check whether the thread should exit, and if not, remain blocked waiting for the signal. Once a SIGCHLD is delivered to this thread, we can react to it immediately, and in line of the wait thread, without using a signal handler.

+7  A: 

The function can be interrupted with a signal, so you could set a timer before calling waitpid() and it will exit with an EINTR when the timer signal is raised. Edit: It should be as simple as calling alarm(5) before calling waitpid().

Steve Baker
What determines which thread handles a signal? How will I be sure that this is the thread that handles it? Is it that alarm was called in some thread, so that thread handles the signal?
Greg Rogers
The man page for signal seems to say that the result is unspecified, which means that it may not be handled by the right thread and lead to incorrect results.
Greg Rogers
It is probably a good idea to have just one thread which receives signals, ensuring that all other threads mask the signal with sigprocmask or similar
MarkR
note to anyone reading the above comment: use pthread_sigmask not sigprocmask
Greg Rogers
Don't actually do this. You can lose children if waitpid() reaps the child but SIGALRM fires before the kernel returns. Many unixes have bugs here as well, and don't EINTR correctly even in the ideal case.
geocar
+1  A: 

I can use a signal handler for SIGCHLD, and in the signal handler do whatever I was going to do when a child exits, or send a message to a different thread to do some action. But using a signal handler obfuscates the flow of the code a little bit.

In order to avoid race conditions you should avoid doing anything more complex than changing a volatile flag in a signal handler.

I think the best option in your case is to send a signal to the parent. waitpid() will then set errno to EINTR and return. At this point you check for waitpid return value and errno, notice you have been sent a signal and take appropriate action.

Krunch
Well, you can do the self-pipe trick, and have the waitpid-thread really be blocking on a select to a pipe instead. Then, when it gets SIGCHLD, have it write a byte to the pipe, which wakes itself up.
wnoise
+2  A: 

If you're going to use signals anyways (as per Steve's suggestion), you can just send the signal manually when you want to exit. This will cause waitpid to return EINTR and the thread can then exit. No need for a periodic alarm/restart.

Chris Dodd
A: 

Just off the top of my head...

[deleted]

Well, my 'off the top of my head' answer should have stayed where it was. Move along...

shank
+4  A: 

Don't mix alarm() with wait(). You can lose error information that way.

Use the self-pipe trick. This turns any signal into a select()able event:

int selfpipe[2];
void selfpipe_sigh(int n)
{
    write(selfpipe[1], "",1);
}
void selfpipe_setup(void)
{
    static struct sigaction act;
    if (pipe(selfpipe) == -1) { abort(); }
    fcntl(selfpipe[0],F_SETFL,fctnl(selfpipe[0],F_GETFL)|O_NONBLOCK);
    fcntl(selfpipe[1],F_SETFL,fctnl(selfpipe[1],F_GETFL)|O_NONBLOCK);
    memset(&act, 0, sizeof(act));
    act.sa_handler = selfpipe_sigh;
    act.sa_flags |= 0;
    sigaction(SIGCHLD, &act, NULL);
}

Then, your waitpid-like function looks like this:

int selfpipe_waitpid(void)
{
    static char dummy[4096];
    fd_set rfds;
    struct timeval tv;
    int died = 0, st;

    tv.tv_sec = 5;
    tv.tv_usec = 0;
    FD_ZERO(&rfds);
    FD_SET(selfpipe[0], &rfds);
    if (select(selfpipe[0]+1, &rfds, NULL, NULL, &tv) > 0) {
       while (read(selfpipe[0],dummy,sizeof(dummy)) > 0);
       while (waitpid(-1, &st, WNOHANG) != -1) died++;
    }
    return died;
}

You can see in selfpipe_waitpid() how you can control the timeout and even mix with other select()-based IO.

geocar
seems like an interesting concept. question, why make the pipe non-blocking? and why do you need to loops after the select? shouldn't there *always* be data when the select succeeds?
Evan Teran
If two children die, you won't necessarily get two SIGCHLD notifications. You make the pipe non-blocking in case too many SIGCHLDs come in (roughly PIPE_BUF).
geocar
The loops also help to protect against too many SIGCHLDs, and while ideally there would always be data after select completes, read() will block until sizeof(dummy) bytes are filled unless it is marked non-blocking for read.
geocar