views:

396

answers:

4

I have a server that listens for socket connections and perform different kind of actions, depending on the request. One of them is long lived database queries, for which the server forks.

The server keeps a log of all the active children and whenever asked to shutdown, it will kill all it's children before exiting. A couple of times I have encountered the situation that the server crashed or was killed ungracefully, which lead to the child process becoming orphan. If I try to bring the server back again, it will refuse saying the the listening socket is not able to bind because that address/port is already bound.

I am looking for a way to improve this kind of situation, so that the main server process can come back right away. I've tried monitoring the parent existance from the child and exiting as soon at is gone, but this has only resulted in having zombie processes and the socket seems to still be bound.

The server is written in Python, but any explanation or suggestion in any language is welcome.

A: 

Perhaps when you fork, disown the child, so that the parent process isn't the parent registered with the OS. Does the parent really need to communicate with the child? If not this may be an option.

You can keep track of child processes, but in a different way. You won't get SIGCHLD events anymore.

Kekoa
A: 

Unix can handle this for you automatically. When the parent process exits (for any reason) the child processes will all receive SIGCHLD. By default, your child process will ignore this signal, however. All you have to do is register a signal handler for this signal.

Chris Jones
I think you got this backwards. SIGCHLD is sent to a parent when a child exits.
sigjuice
On Linux, a child process can be notified of the death of its parent. See PR_SET_PDEATHSIG here, http://www.kernel.org/doc/man-pages/online/pages/man2/prctl.2.html
sigjuice
Oh you're right -- SIGCHLD is sent to the parent when the child exits.You can set things up so the child process will receive a SIGHUP, though. I don't remember the details, but look for "orphaned process group" as a key phrase.
Chris Jones
+1  A: 

Use this on your socket before you call listen():

int on = 1;
setsockopt (sockfd_wan, SOL_SOCKET, SO_REUSEADDR, &on, sizeof (on));

It allows your programm to use that socket, even it was randomly picked before by another outgoing TCP-connection (cannot happen for ports <1024). But it should also help directly with your problem!!

Unrelated:

There is another bad thing that can happen: If your childs are forked, they inherit EVERY open filedescriptor. If they simply fork and launch another long running programm, those will also have an open handle to your listen-socket, so it stays in use (find out with lsof and netstat command!)

So one should call this:

int close_on_exec_on(int fd)
{
  return fcntl(fd, F_SETFD, FD_CLOEXEC);
}

close_on_exec_on(sockfd);

But I never tried it in the main programm if it forks off childs and it clearly will not help you because the childs are forked, not run with exec.

But keep it in mind and call it on your listen socket in the main programm anyway! Just in case you run an external programm

Christian
+1  A: 

Make your server the leader of a process group. In that case children are terminated when the group leader exits.

Where a textual user interface is being used on a Unix-like system, sessions are used to implement login sessions. A single process, the session leader, interacts with the controlling terminal in order to ensure that all programs are terminated when a user "hangs up" the terminal connection. (Where a session leader is absent, the processes in the terminal's foreground process group are expected to handle hangups.)

lothar