tags:

views:

1806

answers:

5

I recently ran into a issue where intermediate link betweeen a TCP server and client was down. The client has the requirement of connecting to a secondary server if the primary server is down. When the primary server is bought down (Ex ..by doing ^C on the terminal), there is TCP shutdown sequence that gets through and client sucessfully detects the broken link and tries the secondary. However if the intermediate link goes down ,the client and server would be unaware of it. The only way the client can detect is when its TCP bufferes gets filled up with failed 'send' operations.

As a solution to this the 'TCP Keepalive' mechanism has been used. This works satisfacatorily.

My question is 'TCP Keepalive' the only solution?

-Prabhu

A: 

You could invent and implement your own keep-alive using TCP's Out-Of-Band feature, but I wouldn't even consider that unless you have some significant issue with the one that's already built for you.

Brad Wilson
+1  A: 

Keepalive was designed to deal with so-called half-opened connections, when one of the sides (typically the server that receives the requests) is unaware that connection was broken. Client usually knows about it because the attempt to send request to the server will return you error.

Another option is to keep listener running - when client detects comms problems it just tries to connect to the server again. Server gets the incoming connection, check whether it from the same IP address, and if it is the case, closes opened connection and establishes a new one.

But if client is unaware that connection went down and server needs to send something, there is no way for server to re-establish connection but TCP keepalive.

If you don't want to use keepalive, you can use application-level keepalive, e.g. sending something like application-specific echo messages.

qrdl
According to RFC 1122 it takes a minimum of 2 hours idle time before TCP starts sending keep alive probes. This could be a problem for some if not most applications and limits the usefulness of TCP Keepalive.
Robert S. Barnes
You can control keepalive behaviour using TCP_KEEPCNT, TCP_KEEPIDLE and TCP_KEEPINTVL socket options (using SOL_TCP level, not SOL_SOCKET). 2 hours isn't a minimum idle time - it is default idle time.
qrdl
+1  A: 

I always handled this at the application level by extended the protocol talked via TCP between client and servers with "Keep Alive"-Messages server and client send this message e.g. each second and if they have not got "Keep Alive"-Message within 2 seconds, connection is probably closed.

The Keep-Alive mechanism of TCP is fine, but difficult to use especially when working on different platforms.

Seika
+1  A: 

Even without SO_KEEPALIVE set, if you try to send data along a dead tcp connection, it typically gets reset, or will eventually time out - either of these sends an error to the application eventually.

SO_KEEPALIVE means that this may be detected sooner on an otherwise idle connection. That's all.

MarkR
Yes Mark. My requirement was that the client should try secondary server as soon as possible. With out Keepalive mechanism that failover to secondary was very delayed.
Prabhu. S
A: 

Another solution is to use a heartbeat on a separate socket. That way you know almost immediately if the connection is down. This is useful when your primary connection is sending streaming data with no message boundaries.

Robert S. Barnes
That only tells you that the secondary socket is down, it doesn't tell you that the primary connection is down. Given all the things that can happen to a packet on the internet, it's not impossible that one could survive while the other dies. It's EXTREMELY unlikely, but it's not impossible. The right way to do a heartbeat is to do it on the main socket.
Michael Kohne
@Michael Kohne I was thinking of a situation in which you're sending streaming data with no message boundaries - in that situation it's really not so practical to embed keep alive. You really need to use a separate connection in that case. However, if you're sending packetized data then yes, you can just add a packet type for heartbeat and send over the same connection.
Robert S. Barnes