views:

275

answers:

1

TCP/IP connections KeepAlives are specified to be at least once every two hours: http://tools.ietf.org/html/rfc1122#page-101. The problem is this was written in 1989 and is concerned about the cost of sending the extra KeepAlive packet! Yet it still is the default time most OS's in accordance with the spec send out KeepAlives down a connected socket after a period of inactivity! Of course nowadays most connections timeout way before that if inactive, and if connected to a peer over the Internet connections die without your knowledge way before that (despite setting ones timeout higher than that - I suspect because router tables in between don't bother keeping it alive - I always wondered where the leaked last message goes... UPDATE: The reason for this is 'routers' that may be at your or the remote hosts end are 'stateful' and connection aware and drop the connection after some period of time of inactivity - the routers that your go through over the Internet cannot drop your connection - they they dont care - the packet is just send where it has to go). So I have seen 2 common solutions to keeping ones connection alive over the Internet:

1) Disregard (EDIT: as has been pointed out to me this is not disregarding the spec it is just changing the default) the spec and change your system wide KeepAlive interval to lower than 2 hours, or 2) implement your own 'KeepAlive' system polling the peer periodically.

Either way; what is a suitable period (of inactivity at which to send your KeepAlive)? I have seen everything from 1 second to the default 2 hours. It seems the number is sucked out of thumbs... If I have a client application connecting from potentially anywhere in the world what is safe and reasonable period (I want a single persistent connection)? Connecting to a peer many hops away on the other side of the world over the Internet the connection dies on 301 seconds (though you only know about it when you try send something) so setting the period to 300 seconds seems to be the magic number - I get the KeepAlive 1 second before death - this interval has never failed me.. but is it safe?

EDIT: This particular connection im implementing in C# 3.0 so code in that welcome.

+4  A: 

TCP/IP connections KeepAlives are specified to be at most once every two hours.

That's not what it says. It says 'This interval MUST be configurable and MUST default to no less than two hours.'

The problem is this was written in 1989 and is concerned about the cost of sending the extra KeepAlive packet!

The real reason for making keep-alive optional and a 2-hour default if provided is #(2) in that list of reasons - TCP/IP is supposed to survive temporary outages of intermediate equipment, re-routing, etc. It is a useful mechanism for e.g. Telnet servers to detect lost clients.

Disregard the spec and change your system wide KeepAlive interval to lower than 2 hours.

That's not 'disregarding the spec.' That's just changing the default.

Most applications that want long-term connections either provide a ping in their own application protocol or use connection pools that survive connection failures.

EJP
Thanks for the clarification - but you still havent answered my question - what is a suitable period of inactivity at which to send out KeepAlives? For the purpose of: "TCP/IP is supposed to survive temporary outages of intermediate equipment, re-routing, etc. It is a useful mechanism for e.g. Telnet servers to detect lost clients" when connected over the Internet...
Mrk Mnl
I was half joking when I said the reason was the cost of sending an extra packet (it is one of the reasons) - I know the main reason - thats exactly what I want to use it for!
Mrk Mnl
It depends *entirely* on your aplication. Clearly the designers of TCP/IP felt that two hours was one too short. But if you have an application which needs the peer to be there every 10 seconds, maybe 5 seconds is a good time. Or 20. If your period is in minutes I would use a read timeout and a connection pool rather than a ping.
EJP
I have described my application: a client wanting to maintain a single persistent connection (indefinitely) over the Internet (such as telnet does), no doubt a common scenario, again what is the best period of of inactivity to use? If you dont have an answer pls stop lecturing me on what is suppose be done without giving me answer as to when, how and how much...
Mrk Mnl
A meaninglesss rant. When, how, and how much *what?* The answers depend on the application. You've described your application qualitatively, but not quantitatively, and you are asking quantitative questions. Nobody but you can answer them. What are its response time requirements? How often does it use the connection? How much downtime can it tolerate? Maybe the Telnet default is adequate for your application. Maybe it isn't. I don't know. I'm getting not to care much either.
EJP
How often to send a KeepAlive to keep a connection (over the Internet) alive? Otherwise the connection dies. That is the question.
Mrk Mnl
That depends on the application used! I would assume that the connection is closed not because of the underlying socket being shut down but because the server application decides that it is time to close the connection. I would not change the default keep-alive but as already pointed out implement a keep alive within the applicaiton protocol - that is the way to go!
Mario The Spoon
Connections over the Internet close because they are idle for a x amount of time - depending on what the equipment is between you and the peer and how long they hold your TCP session for - otherwise the you get a remote host has closed the connection error even when they havent - the server nor client closes the connection. Since the server only uses TCP for this one application I may as well change the default instead of re-inventing the wheel - it works I have tried it and monitor the packets in and out - just want to get the right period so any client anywhere will not have a problem.
Mrk Mnl
While not sending too many too.
Mrk Mnl
I can assure you the server nor firewall nor OS has closed the session - it is my server and I have written it!
Mrk Mnl
SO_KEEPALIVE is a setting that is maintained by the OS, so it is the OS that actually closes the connection (as far as I can remember my Stevens). I do not have my Stevens with me, I'll check it for details later on. I can't see if this has been already pointed out: SO_KEEPALIVE is a system wide parameter, so it applies for all socket connections - also local ones. Though I agree with you that 1989 is way back, I am not so sure if changing this global setting will not have a performance impact over all.
Mario The Spoon
I know - but the connection is not being closed at all - the server sits there waiting but sees nothing from the client after the client has been inactive for a period of time (typically 301 seconds in my experience) even if the client tries to send something after that (the client gets the: "remote host is foricbly closed..." but it hasnt). The reason for this is the the TCP session has been dropped buy, routers, switches, whatever inbetween as they have their own idle TCP session timeouts...
Mrk Mnl
Im not so worried about the how, just the period of inactivity when to send KeepAlives (or ping whatever) to keep the connection alive..
Mrk Mnl
Than I would guess that a NATting is dropped somewhere or that a router on the client side dropped the connection. If the client sits behind some SOHO WLAN equippment they seem to ship with an idle timeout of 5 minutes (I checked netgears documentation) - so your 300secs look good. I could not find any info about the NATting retention. Be aware though, that - depending on your server applicatino and number of clients - you may start to hog sockets which are unused. This is something that may not be looked upon kindly by some sysadmins. hth
Mario The Spoon
So if it happens after 5 minutes I would send application pings every 2.5 minutes. But I would also investigate *why.* Sniff the network and find out where the RST is originating. 5 minutes is far too short for any standard network elements like routers or keepalives to drop the connection. I suspect it is your code frankly, *especially* as you wrote both ends.
EJP
Which is why I specified: "sockets connected over the Internet" - locally or on my LAN the connection is not dropped however when connected over the Internet I have no control what equipment is between the peers - actually it is considered good practice for such equipment to drop inactive TCP sessions otherwise imagine how many inactive sessions where a peer has died would have to be held that are not going to be used again consuming resources - I know Apache HTTP servers drop TCP sessions after 5 seconds of inactivity by default.
Mrk Mnl
The relevance of this escapes me. The Apache server isn't 'between the peers'. It *is* a peer: and if it wants to drop the connection after 5 seconds you don't really have any business stopping it. And anyway you said you wrote both ends. I agree that routers drop connections, but not usually after 5 minutes. Commercially it is more like an hour apart from home products like the one cited above. But as I said if you are getting drops every N minutes ping every N/2 minutes and you are done.
EJP
I have written the server - I was using the Apache example as an example as to why TCP connections are dropped - to free up resources. Yeah turns our it is home/commercial 'routers' which are "stateful" and connection aware and tend to drop external TCP connections after some period of inactivity (see Kevin Nisbet's comment). The problem is the client could be potentially any user on the Internet which brings me back to where I started - im going to have to do some research into typical periods of inactivity, worst case scenarios and allow for a client to set their own period of inactivity...
Mrk Mnl