views:

1151

answers:

3

I have a Java nonblocking server that keeps track of all the socket channels in a selector. I then establish 500 connections to the server and send data regularly. Every piece of data the server receives is echoed back to the client.

The problem comes where the test works wonderfully for a couple of hours and then all of the sudden gradually all of the sockets the server is managing throw a Connection timed out IOException when attempting to read data.

I've looked into whether or not the client thread was being starved (and not sending data), but I am yielding to the client thread that iterates through all the sockets and writes out data. Traffic seems to be constantly flowing properly, but after a while it just all dies out. Any ideas what could be causing this behavior?

I'm running on a Linux platform with the latest iteration of Java 6. My application launches two threads, one for the server, and one for all the clients. Thanks in advance!

Extra: The issue is dealing with Linux and not my code. When I run the exact same setup on a Windows box (on the same hardware) it never times out, but after several hours they start to occur on Linux. It must be some kind of TCP setting in Linux that's causing it to happen. Thanks for the suggestion.

A: 

The -doCloseWithReadPending option in Java and JRE versions 1.5 or 5.0 allows one thread to close a socket when there is a read pending on the same socket from another thread.

When close() is called on a socket which has an outstanding read call from another thread, the close() by default blocks the socket until the read call completes.

With the -doCloseWithReadPending option, the socket close() call closes the socket and in the context of the thread with the pending read, a SocketException with the message "Socket closed" is thrown.

I don't know if this is the root cause of your issue without seeing the code, but I thought I would add this here incase it affects your issue.

Amir Afghani
+1  A: 

The issue is dealing with Linux and not my code. When I run the exact same setup on a Windows box (on the same hardware) it never times out, but after several hours they start to occur on Linux. It must be some kind of TCP setting in Linux that's causing it to happen. Thanks for the suggestion.

Did you see Chris's and my comments? We need more information to help.
John Kugelman
I've moved this info to the question - this probably shouldn't be an answer
Nick Fortescue
A: 

So in both the case that works (Windows with recent JVM) and the case that doesn't (Linux with recent JVM), both the server and client are on the same machine in the same JVM?

Can you clarify what "all of the sudden gradually" means? Like, after a few hours -- and always the same number of hours -- then within a few seconds all server-side sockets throw exceptions?

You don't mention the client thread reading the data that comes back. Perhaps it's stopped and you haven't noticed. (What is the client thread doing when the server thread encounters the 500 rapid exceptions? Try a few stack dumps in succession to see.)

gojomo
After what seems like about 4-5 hours, Linux client sockets begin timing out and shut down (even though they are still sending data). When it starts, there's about a quarter second delay between each client as it shuts down. When I connect the clients in the beginning, I have a delay of 250ms between connections, so it seems that they all time out after being active for the same period of time. It's very odd. It doesn't happen on Windows.
The problem exists on the clients because the server receives a -1 which as far as I know from looking at documentation means the client shutdown the socket cleanly. The server moves on with life.
I wouldn't assume the -1/EOF means clean shutdown -- just that the reading stream has ended, for whatever reason. Your initial report said the server encountered exceptions; are you now saying it does not? Please answer the other questions from everyone if you want to resolve this. (1) Code; (2) Output of netstat; (3) Confirm whether client and server are running in the same JVM in both cases; (4) Stack dumps (SIGQUIT/Ctrl-Break) during working period and immediately after.The actual error stack would help too, and you should check if a single socket exhibits the same problem.
gojomo
-1 is an error - a read of 0 is a clean shutdown.
nos
noselasd: Not true; both SocketChannels and a traditional InputStreams return -1 at end-of-data, even if there is no error. See http://java.sun.com/j2se/1.5.0/docs/api/java/nio/channels/SocketChannel.html#read(java.nio.ByteBuffer) or http://java.sun.com/j2se/1.5.0/docs/api/java/io/InputStream.html#read() .
gojomo
I think those links broke accidentally: here are what I think they were supposed to be: http://java.sun.com/j2se/1.5.0/docs/api/java/nio/channels/SocketChannel.html#read%28java.nio.ByteBuffer%29 and http://java.sun.com/j2se/1.5.0/docs/api/java/io/InputStream.html#read%28%29
Stobor