tags:

views:

165

answers:

6

I am using Java (although I think the socket options is implement in most languages) to implement a client and server. The server sends data to the client for processing which the client acknowledges. On another port the client then sends the results of the processing back to the server. When it comes to options such as

  • SO_LINGER
  • SO_KEEPALIVE
  • SO_NODELAY
  • SO_REUSEADDRESS
  • SO_SENDBUFFER
  • SO_RECBUFFER
  • TCP_NODELAY

We have noticed that the connection between the client and server occasionally breaks. There will be a timeout on the send or the receive. When this happens will kill the socket and open a new one to continue.

What would be the best options to set in terms of the above scenario and is there anything that we could do from our side (programmatically or options-wise) to try minimize the amount of times the connection is dropped. We are using normal TCP/IP.

UPDATE: The bounty on this ends soon. I haven't had a satisfactory answer yet so it is still open. I think everyone is missing the point of the quest. What is the best practice with regards to the options above for sockets that continuously chat. I have already got a ping packet in that if there is no work to be done (hardly ever the scenario) the normal message is sent with no inner elements so there is always processing.

A: 

try these options SO_LINGER - for specyfying when the Socket close s called while some unsent data in the queue TCP_NODELAY - For non blocking datat transfer

Dharma
`TCP_NODELAY` turns off Nagle's algorithm. That's useful at the start of a session, but once the session is running, it'll have no effect.
sarnold
+2  A: 

First I think that this page is relevant, regarding half-open connections. http://nitoprograms.blogspot.com/2009/05/detection-of-half-open-dropped.html

That being said, TCP is designed to hide connection problems, so you may often find yourself in cases where the connection is broken, but neither side thinks it is. You have addressed this partially by using timeouts and taking that as a sign the connection is broken.

Since you are writing the client and server, I would avoid relying on TCP to tell you when the connection is broken altogether. I would just have the server also acknowledge the receipt of the result from the client. Then both sides will expect immediate responses to their messages, and you can track which messages have been ack'd and set an appropriately small timeout for receiving the ack. This is not a timeout on the send or receive, but a timeout on the time between sending a message and receiving the ack for that message. Then you can set the timeout appropriately depending on the quality of your connection (e.g. very small if you are running on loopback, but large if running over wireless with a weak signal).

Regarding the options you list, you will want to use SO_REUSEADDRESS so that you won't be prevented from reopening the socket, for example if it hasn't finished closing from a previously killed process.

cape1232
+1  A: 

You probably have, but it is best to check the obvious....

Have you verified that it IS the socket that is timing out, and not your code? Sockets are fairly stable, and while there might be an issue somewhere, it seems more likely that it is in your code. I would use logs, timestamps, and synchronised clocks to be sure.

There may be an issue that you genuinely DO take a long time to do the calculation, so maybe adding a 'I'm still thinking about it' message to your protocol that gets sent regularly, to keep the connection alive?

Of course networks will drop out from time to time regardless of what you do, and it sounds like you are already handling that case nicely.

Jon
A: 

I would strongly encourage you to use a ping/echo model between client and server, so that if no data is sent for x seconds a ping message needs to be send. A typical reason for a break might be a firewall, which shuts down socketss because of inactivity.
The typical issue where the TCP model fails are physical problems e.g. a pulled/broken cable and hangs on one side, where technically someone is listening until a queue overrun kicks in (which might never happen given your amount of data).

weismat
A: 

What are the chances the connection is going through a NAT firewall somewhere along the way? Stateful firewalls maintain a table of open connections so that packets belonging to an allowed connection can quickly pass through the system, without forcing firewall admins to write overly-complex rule sets.

The downside is that this table can grow immensely large, so it must be pruned as connections are closed or as they appear to have simply grown stale and died quietly. A connection that has gone silent for 20 minutes is usually quiet enough to reaped. (Which is really very quick, as the TCP KEEPALIVE is typically two hours, making it nearly useless in the face of NAT firewalls.)

So: is this going through a NAT firewall? Is the connection quiet for long stretches? If so, add a ping/pong to your protocol, and fire it every few minutes.

sarnold
The connections are never quiet. They are constantly transferring data. We have already implemented a pinging packet but the connection is still lost. Maybe once every day or so.
uriDium
+1  A: 

Strictly speaking, you don't need any of these socket options:

* SO_LINGER

You need to set SO_LINGER only if your application still has outstanding packets to send when close(2) or shutdown(2) has been called. Not really applicable for your application.

* SO_KEEPALIVE

Sending keepalive-pings every two hours would really only help very long-lived but -very- quiet connections going through stateful firewalls with very long session timeouts. (Two hours between pings is entirely too long to be practical in today's Internet.)

* SO_NODELAY

This (presumably an alias for TCP_NODELAY) disables Nagle's algorithm, which is just a small-packet-avoidance problem. Perhaps Nagle is getting in the way in your application, but it takes special sequences of packets to introduce 500ms delays into processing; it never just hangs connections.

* SO_REUSEADDRESS

Useful for all 'servers' that listen on well-known port numbers; use on 'clients' is almost always covering up some bug or other, but it is sometimes necessary if requests must come from a well-known port number.

* SO_SENDBUFFER
* SO_RECBUFFER

These buffer sizes influence the kernel-side buffer sizes maintained for receiving or sending data while your program (receive buffer) or the socket (send buffer) isn't yet ready to accept more data. If these are set too small, your application might not transfer data as smoothly as possible, reducing throughput, but it should not lead to any stalls if these are set smaller than optimal. Of course, too large may put unreasonable demands on kernel memory, but there should be a reasonable system-wide maximum allowed size.

* TCP_NODELAY

Disables Nagle. Not likely to do more than introduce 500ms delays if your application sends multiple small packets before attempting a blocking read.

Really, you shouldn't need to set any socket options.

Can you distill your code into something that could be pasted here and tested or inspected? I'm used to TCP sessions surviving for days or weeks without trouble, so this is pretty surprising.

sarnold
Thanks, this is the best answer. I think that the bounty may have expired. Sorry about that. :(
uriDium
@uridium, no worries, glad it helped. :)
sarnold