views:

78

answers:

2

I have a rather simple piece of code that is hanging in java. The hang is VERY infrequently. Maybe on the order of once every 1000 executions. Running it in a loop to a device doesn't seem to reproduce the problem.

long timeout = 10000;
long endTime = System.currentTimeMillis() + timeout + 5000;
Socket pingSocket = null;
String host = "host";
String port = "22";

do {

    try {
        pingSocket = new Socket();
        pingSocket.bind(null);
        pingSocket.connect(new InetSocketAddress(host, port), 5000);
        if (pingSocket.isConnected()) {
            pingSocket.close();
            return true;
        }
        pingSocket.close();
    }
    catch (UnknownHostException e) {
        throw e;
    }
    catch (IOException e) {
        // All other errors are subclassed from IOException, and i want
        // to ignore till after my spin period.
    }

    try {
        Thread.sleep(SPIN_SLEEP_DELAY);
    }
    catch (InterruptedException e) {
        return false;
    }

} while (System.currentTimeMillis() <= endTime);

Since it happens so infrequently in production it's been hard to narrow down what caused the problem. I'm currently in the process of instrumenting the code so that next release of our product will have more information when this happens, but I thought I would ask if anyone has seen just a simple bind/connect/isConnected/close hang before?

Thanks!

+3  A: 

Have you generated a Java thread dump during the hang? This will tell you where in your code the hang is occurring.

Darron
Great idea Darron -- The problem is this only happens once every couple of weeks at most. So it's more of a "did it happen, did a customer let us know, can we get to it before they kill it, etc".
dpb
So, a typical customer production environment then. :-(
Darron
I'm going to accept this one. This seems like the only real solution is to catch it when it's happening.
dpb
+2  A: 

I've read that if a socket fails to connect then the calling code still has to close it before continuing (can't find the reference now). Otherwise resources are still consumed and future attempts to open sockets may hang.

So move the socket close to a finally block to ensure that your socket is closed even if it fails to connect.

Joe
Thanks @Joe, I think this is a very likely solution, though I can't confirm it very easily. If I could accept 2 answers I would. I will be adding this to the code after researching it some more.
dpb