views:

577

answers:

4

For fun, I've been toying around with writing a load balancer in python and have been trying to figure the best (correct?) way to test if a port is available and the remote host is still there.

I'm finding that, once connected, it becomes difficult to tell when the remote host goes down. I've turned keep alive on, but can't get it to recognize a downed connection sooner than a minute (I realize polling more often than a minute might be overkill, but lets say I wanted to), even after setting the various TCP_KEEPALIVE options to their lowest.

When I use nonblocking sockets, I've noticed that a recv() will return an error ("resource temporarily unavailable") when it reads from a live socket, but returns "" when reading from a dead one (send and recv of 0 bytes, which might be the cause?). That seems like an odd way to test for it connected, though, and makes it impossible to tell if the connected died but after sending some data.

Aside from connecting/disconnecting for every check, is there something I can do? Can I manually send a tcp keepalive, or can I establish a lower level connection that will let me test the connectivity without sending real data the remote server would potentially process?

A: 

ping was invented for that purpose

also you might be able to send malformed TCP packets to your destination. For example, in the TCP headers there is a flag for acknowleging end of transmission, its the FIN message. If you send a message with ACK and FIN the remote host should complain with a return packet and you'll be able to evaluate round trip time.

Eric
ping is useless here: it may be filtered out (it often is) and it does not test that the TCP server runs properly (it may be frozen or lost in an endless loop).
bortzmeyer
bortzmeyer is right - ping is useless for this. And remote firewalls will quite likely block any malformed TCP packets you might send.
Alnitak
+2  A: 

I'd recommend not leaving your (single) test socket connected - make a new connection each time you need to poll. Every load balancer / server availability system I've ever seen uses this method instead of a persistent connection.

If the remote server hasn't responded within a reasonable amount of time (e.g. 10s) mark it as "down". Use timers and signals rather than function response codes to handle that timeout.

Alnitak
A: 

It is theoretically possible to spam a keepalive packet. But to set it to very low intervals, you may need to dig into raw sockets. Also, your host may ignore it if its coming in too fast.

The best way to check if a host is alive in a TCP connection is to send data, and wait for an ACK packet. If the ACK packet arrives, the SEND function will return non-zero.

Unknown
+1  A: 

"it becomes difficult to tell when the remote host goes down"

Correct. This is a feature of TCP. The whole point of TCP is to have an enduring connection between ports. Theoretically an application can drop and reconnect to the port through TCP (the socket libraries don't provide a lot of support for this, but it's part of the TCP protocol).

S.Lott