tags:

views:

212

answers:

3

I am doing some experimentation over an unreliable radio network (home brewed) using very rudimentary java socket programming to transfer messages back and forth between the end nodes.

The setup is as follows:

Node A --- Relay Node --- Node B

One problem I am constantly running into is that somehow the connection drops out and neither Node A or B knows that the link is dead, and yet continues to transmit data. The TCP connection does not time out either. I have added in a heartbeat message that causes a timeout after a while, but I still would like to know what is the underlying cause of why TCP does not time out.

Here are the options I am enabling when setting up a socket:

channel.socket().setKeepAlive(false);
channel.socket().setTrafficClass(0x08); // for max throughput

This behavior is strange since it is totally different than when I have a wired network. On a wired network, I can simulate a disconnected connection by pulling out the ethernet cord, however, once I plug the cord back in, the connection becomes restablished and messages begin to be passed through once more.

On the radio network, the connection is never reestablished and once it silently dies, the messages never resume.

Is there some other unknown java implentation or setting for a socket that I can use, also, why am I seeing this behavior in the first place?

And yes, before anyone says anything, I know TCP is not the preffered choice over an unreliable network, but in this case I wanted to ensure no packet loss.

+2  A: 

The TCP protocol was designed to be quiet. The RFC requires keepalive heartbeat no more frequent than 2 hours. Unless you have control over the system on both ends to change the default 2 hour heartbeat (sometimes, it requires kernel rebuild), you have to add heartbeat in your own app.

If you send heartbeat, it still needs to wait till Retransmit Timeout, which varies depending on the RTT. On a high-latency network, the timeout can be very high but it should be within minutes.

You get notification on local network because the system can detect link-down status and drop all connections on that network.

BTW, you want set Keepalive to TRUE, instead of false. With Keepalive, you at least get the slow heartbeat.

ZZ Coder
That is the thing, end node A is still trying to send to end node B the entire time, but my channel.isConnected() never goes to false (I test for that in a separate thread before attempting to send a message). And this is over a period of 20+ minutes, shouldn't the messages have timed out before then? Also I never receive any exceptions when attepting to send the message.
yx
Do you know what's your RTT (Round-Trip Time)? Calculating RX timeout is serious business, see http://www.ietf.org/rfc/rfc2988.txt and it could be over 20 minutes if RTT is really large.
ZZ Coder
+2  A: 

In the OSI 7-layer model, the first two layers are physical and data link. Your physical hardware running the data link protocol on wired ethernet can detect when the cable is pulled. Your wireless hardware, and corresponding protocol, probably not so much. The TCP stack can't do anything to timeout if the layer1/2 stuff isn't signaling that it is disconnected.

Zak
interesting, I will look into possibly implementing the signaling aspect
yx
+1  A: 

Define 'never'?

I expect you will be notified by a send failing eventually. You're probably just expecting to be notified sooner than you will be. The TCP stack will be retransmitting segments that it doesn't get ACKs for and the timeout before retransmission for each attempt is doubled each time it retransmits. Depending on how the stack is working out when to retransmit it's probably going to be longer than you're expecting before the stack will decide that the connection is broken and only then will it let you know.

See here: http://www.ietf.org/rfc/rfc2988.txt, here: http://msdn.microsoft.com/en-us/library/ms819737.aspx, etc.

You're used to having a wired network where the drivers can notify higher level layers that the connection has been physically broken. If you were to configure a wired network to route via a router which you then deliberately set up to not route correctly then you'd probably see similar behaviour....

Len Holgate