views:

270

answers:

2

While this question is tagged EventMachine, generic BSD-socket solutions in any language are much appreciated too.


Some background:

I have an application listening on a TCP socket. It is started and shut down with a regular System V style init script.

My problem is that it needs some time to start up before it is ready to service the TCP socket. It's not too long, perhaps only 5 seconds, but that's 5 seconds too long when a restart needs to be performed during a workday. It's also crucial that existing connections remain open and are finished normally.

Reasons for a restart of the application are patches, upgrades, and the like. I unfortunately find myself in the position that, every once in a while, I need to do this kind of thing in production.


The question:

I'm looking for a way to do a neat hand-over of the TCP listening socket, from one process to another, and as a result get only a split second of downtime. I'd like existing connections / sockets to remain open and finish processing in the old process, while the new process starts servicing new connectinos.

Is there some proven method of doing this using BSD-sockets? (Bonus points for an EventMachine solution.)

Are there perhaps open-source libraries out there implementing this, that I can use as is, or use as a reference? (Again, non-Ruby and non-EventMachine solutions are appreciated too!)

+2  A: 

There are a couple of ways to do this with no downtime, with appropriate modifications to the server program.

One is to implement a restart capability in the server itself, for example upon receipt of a certain signal or other message. The program would then exec its new version, passing it the file descriptor number of the listening socket e.g. as an argument. This socket would have the FD_CLOEXEC flag clear (the default) so that it would be inherited. Since the other sockets will continue to be serviced by the original process and should not be passed on to the new process, the flag should be set on those e.g. using fcntl(). After forking and execing the new process, the original process can go ahead and close the listening socket without any interruption to the service, since the new process is now listening on that socket.

An alternative method, if you do not want the old server to have to fork and exec the new server itself, would be to use a Unix-domain socket to communicate between the old and new server process. A new server process could check for such a socket in a well-known location in the file system when it is starting. If present, the new server would connect to this socket and request that the old server transfer its listening socket as ancillary data using SCM_RIGHTS. An example of this is given at the end of cmsg(3).

mark4o
The `SCM_RIGHTS` trick sounds neat. I'll give that a go tomorrow.
Shtééf
For reference, Ruby exposes this as `UNIXSocket#send_io`. It sends a single file descriptor and a null byte. Doesn't appear to be another way, and getting a UNIXSocket class from EventMachine is not possible either. (Perhaps I will hack that.) So the way to do it in EventMachine is with manual Socket magic, and `EventMachine::watch`. Thanks for the pointer.
Shtééf
A: 

Jean-Paul Calderone wrote a detailed presentation in 2004 on a holistic solution to your problem using Twisted, including socket migration and other issues.

Glyph