views:

1016

answers:

5

Suppose I have simple nio based java server. For example (simplified code):

while (!self.isInterrupted()) {
  if (selector.select() <= 0) {
    continue;
  }

  Iterator<SelectionKey> iterator = selector.selectedKeys().iterator();
  while (iterator.hasNext()) {
    SelectionKey key = iterator.next();
    iterator.remove();
    SelectableChannel channel = key.channel();

    if (key.isValid() && key.isAcceptable()) {
      SocketChannel client = ((ServerSocketChannel) channel).accept();
      if (client != null) {
        client.configureBlocking(false);
        client.register(selector, SelectionKey.OP_READ);
      }
    } else if (key.isValid() && key.isReadable()) {
      channel.read(buffer);
      channel.close();
    }
  }
}

So, this is simple single threaded non blocking server.

Problem reside in following code.

channel.read(buffer);
channel.close();

When I closing channel in same thread (thread that accept connection and reading data) all works fine. But I got a problem when connection closed in another thread. For example

((SocketChannel) channel).read(buffer);
executor.execute(new Runnable() {
   public void run() {
     channel.close();
   }
});

In this scenario I ended up with socket in state TIME_WAIT on server and ESTABLISHED on client. So connection is not closing gracefully. Any ideas what's wrong? What I missed?

A: 

I don't see why it would make a difference unless the close is throwing an exception. If it were you wouldn't see the exception. I suggest putting the close in a catch(Throwable t) and print out the exception (assuming there is one)

Peter Lawrey
I tried that. All key points in my code wrapped in try-catch with logging exception. No exception occur at this point. Before closing channel it is active, after close it is on CLOSED state (but socket in TIME_WAIT state).
dotsid
A: 

You know, after a bit more careful testing I cannot reproduce you results on my Mac.

While it is true that the connection remains in TIME_WAIT for something around 1 minute after close on the server side, it closes immediately on the client side (when I connect to it using a telnet client to test).

This is the same regardless of on what thread I close the channel. What machine are you running on and what version of java?

Nuoji
A: 

It may have something to do with the problem mentioned here. If it really is the behaviour of the BSD / OS X poll() method I do think you're out of luck.

I think I would mark this code as non-portable due to - as I understand it - a bug in BSD / OS X.

extraneon
+1  A: 

TIME_WAIT means the OS has received a request to close the socket, but waits for possible late communications from the client side. Client apparently didn't get the RST, since it's still thinks it's ESTABLISHED. It's not Java stuff, it's OS. RST is apparently delayed by OS -- for whatever reason.

Why it only happens when you close it in another thread -- who knows? May be OS believes that closes in another thread should wait for original thread exit, or something. As I said, it's OS internal mechanics.

Vladimir Dyuzhev
+2  A: 

You have a major problem in your example.

With Java NIO, the thread doing the accept() must only be doing the accept(). Toy examples aside you are probably using Java NIO because of anticipated high number of connections. If you even think about doing the read in the same thread as the selects, the pending unaccepted selects will time out waiting for the connection to be established. By the time this one overwrought thread gets around to accepting the connection, the OS's on either side will have given up and the accept() will fail.

Only do the absolute minimum in the selection thread. Any more and you will just being rewriting the code until you do only the minimum.

[In response to comment]

Only in toy examples should the reading be handled on the main thread.

Try to handle:

  • 300+ simultaneous connection attempts.
  • Each connection once established sends 24K bytes to a single server - i.e. a small web page, a tiny .jpg.
  • Slow down each connection slightly ( the connection is being established over a dialup, or the network is having a high-error/retry rate) - so the TCP/IP ACK takes longer than ideal (out of your control OS level thing)
  • Have some of your test connections, send a single bytes every 1 milliseconds. (this simulates a client that is having its own high load condition, so is generating the data at a very slow rate.) The thread has to spend almost the same amount of effort processing a single bytes as it does 24K bytes.
  • Have some connections be cut with no warning ( connection lost issues ).

As a practical matter, the connection needs to be established within 500ms -1500ms before the attempting machine drops the connection.

As a result of all these issues, a single thread will not be able to get all the connections set up fast enough before the machine on the other end gives up the connection attempt. The reads must be in a different thread. period.

[Key Point] I forgot to really be clear about this. But the threads doing the reading will have their own Selector. The Selector used to establish the connection should not be used to listen for new data.

Pat
As far as I remember main thread should read data from socket. This reading is not blocking (data already ready for reading). If you delegating reading to the another thread you end up in situation where selector thread will wake up several times on the same event. This happens because of parallel thread probably will have not time to finish read before selector thread will do next select.
dotsid
Added "Key Point" at end.
Pat