tags:

views:

1709

answers:

3

Please consider the following fork()/SIGCHLD pseudo-code.

  // main program excerpt
    for (;;) {
      if ( is_time_to_make_babies ) {

        pid = fork();
        if (pid == -1) {
          /* fail */
        } else if (pid == 0) {
          /* child stuff */
          print "child started"
          exit
        } else {
          /* parent stuff */
          print "parent forked new child ", pid
          children.add(pid);
        }

      }
    }

  // SIGCHLD handler
  sigchld_handler(signo) {
    while ( (pid = wait(status, WNOHANG)) > 0 ) {
      print "parent caught SIGCHLD from ", pid
      children.remove(pid);
    }
  }

In the above example there's a race-condition. It's possible for "/* child stuff */" to finish before "/* parent stuff */" starts which can result in a child's pid being added to the list of children after it's exited, and never being removed. When the time comes for the app to close down, the parent will wait endlessly for the already-finished child to finish.

One solution I can think of to counter this is to have two lists: started_children and finished_children. I'd add to started_children in the same place I'm adding to children now. But in the signal handler, instead of removing from children I'd add to finished_children. When the app closes down, the parent can simply wait until the difference between started_children and finished_children is zero.

Another possible solution I can think of is using shared-memory, e.g. share the parent's list of children and let the children .add and .remove themselves? But I don't know too much about this.

EDIT: Another possible solution, which was the first thing that came to mind, is to simply add a sleep(1) at the start of /* child stuff */ but that smells funny to me, which is why I left it out. I'm also not even sure it's a 100% fix.

So, how would you correct this race-condition? And if there's a well-established recommended pattern for this, please let me know!

Thanks.

A: 

In addition to the existing "children" add a new data structure "early deaths". This will keep the contents of children clean.

  // main program excerpt
    for (;;) {
      if ( is_time_to_make_babies ) {

        pid = fork();
        if (pid == -1) {
          /* fail */
        } else if (pid == 0) {
          /* child stuff */
          print "child started"
          exit
        } else {
          /* parent stuff */
          print "parent forked new child ", pid
          if (!earlyDeaths.contains(pid)) {
              children.add(pid);
          } else {
              earlyDeaths.remove(pid);
          }
        }

      }
    }

  // SIGCHLD handler
  sigchld_handler(signo) {
    while ( (pid = wait(status, WNOHANG)) > 0 ) {
      print "parent caught SIGCHLD from ", pid
      if (children.contains(pid)) {
          children.remove(pid);
      } else {
          earlyDeaths.add(pid);
      }
    }
  }

EDIT: this can be simplified if your process is single threaded -- earlyDeaths doesn't have to be a container, it just has to hold one pid.

Darron
It doesn't really solve the race condition - child can die while parent is between `if (!earlyDeaths.contains(pid))` and `children.add(pid)`
qrdl
+7  A: 

Simplest solution would be to block SIGCHLD signal before fork() with sigprocmask() and unblock it in parent code after you have processed the pid.

If child died, signal handler for SIGCHLD will be called after you unblock the signal. It is a critical section concept - in your case critical section starts before fork() and ends after children.add().

qrdl
I like this solution. Unfortunately I'm doing this in PHP and there is no sigprocmask() in a release yet :( It is in CVS though so it's only a matter of time I suppose. Thanks for the info. Maybe I should use a different language for this project -- no setproctitle()-alike in PHP either it seems.
A: 

If you can't use critical fragment, maybe a simple counter can do this job. +1 when add, -1 when remove, no mater which one happen first, you eventually can get zero when all is done.

solotim