views:

268

answers:

4

I have an application handling several Java socket connections to different kinds of remote machines (some PCs, others are embedded devices). These sockets and streams should not close indefinitely except for a very good reason (e.g. crash of remote system).

I frequently encounter an issue where the input stream ends unexpectedly, without any reason (value is -1), i.e. the remote machine does not signal a connection abort. But when I discard these -1 reads and continue reading from the stream, the remote machine actually sends new data later. This can go on for a very long time. I also can still write to the output stream too.

In the current situation, I have the choice between treating -1 as end of stream and close the socket (with false positives), or ignore -1 input and risk not being notified of real disconnects.

I haven't been able to create a working example of this issue and the problems appear randomly.

Any ideas what's wrong?

Edited to add: The Java endpoint is a rewrite of an existing VB application that did not have these problems (at least to my knowledge).

A: 

Check the routers inbewteen. It is common for cheap routers, especially those doing NAT, that they clean up their connection tables once in a while, causing your connections to go stale.

In any case, your application should be robust against these things (they will happen again), and you may help it by sending packets without business value across the wire regularily.

Thorbjørn Ravn Andersen
I do. "keep alive" messages are defined and used, but don't help.The problem also appeared once when I ran a server and a client on the same machine.
Daniel Beck
Also, as I said, if I ignore the -1, it works like a charm at least for several hours.
Daniel Beck
A: 

There is apparently some network issues in your environment, and you can try and track them down, but for the time being it's safer to close the stream and re-open it. That's what API assumes.

Vladimir Dyuzhev
A: 

Have you ever used Wireshark? Its very easy to set up and might let you know if there's anything unusual going on with the TCP conversation when this happens.

I had something similar to your issue once and I solved it by sending a ping message every minute between server and client. (It later turned out that a firewall issue was occasionally closing half of the connection if no traffic had gone over it for 10 minutes.)

I know that you're doing KeepAlive messages but its possible that something along the route is not supporting them. If you send your own ping message with a few bytes, you can be sure. I'd capture the actual packets on both ends with Wireshark in either case to make sure that the KeepAlive messages are really getting all the way to the endpoint.

Michael Covelli
I'm not sending tcp keepalive, it's actual "empty" business messages as Thorbjørn suggested to ensure at least one message gets send at least every ten minutes. Less is difficult because the whole thing also needs to work over rather expensive 3G mobile networks so I also need to keep the byte count low.
Daniel Beck
Ah, if you're sending real 0-byte packets from application layer to application layer than I guess that's not it. You could try it with a 1 minute frequency just in dev to rule it out even if that's not a practical solution for prod. But I also missed where you said that it had happened when running on the same machine. I'd try to replicate that first. No need to waste time with wireshark and the networking if it can happen on the same machine.
Michael Covelli
+1  A: 

If you get -1 meaning the stream is closed then you cannot read beyond this and find more data. Once a stream is closed, it cannot be read again.

It sounds like you are performing a read() and casting this to a byte. This means you cannot tell the difference between a 255 value (which you can read beyond) and a -1 stream closed value (which you cannot)

Peter Lawrey
I don't cast the result of the stream.read() operation to byte. I know what the docs say. If I just ignore -1 and try again reading 100 milliseconds later, and repeat this indefinitely, new data arrives eventually. It's crazy, but it actually works.
Daniel Beck
It's possible that this behavior is a consequence of layering a stream on top of a socket (which at the OS level provides its own stream semantics). However, given that the Bug Parade doesn't have any reports of similar behavior, I would suspect the OP's code before suspecting Java.
kdgregory