views:

488

answers:

2

I have encountered a really bizarre stability problem in production when running a trivial Grails application using standard components.

After some time of normal operation the number of Tomcat (jsvc) TCP connections in state CLOSE_WAIT increases until Tomcat hits its thread ceiling (Maximum number of threads (N) created for connector), after which Tomcat grinds to a halt.

Normally this would indicate that the application contains code that does not properly close its TCP connections. However, my Grails code in this application is really really trival and does not initiate any TCP connections on its own, so I can't think of any scenario where my code could cause the CLOSE_WAIT problem.

Furthermore, all the components in the stack are all standard stuff that I'd assume to be bug free; I'm running Grails 1.2.1 under the standard Tomcat 6 that comes bundled in Ubuntu 9.1 (apt-get install tomcat6).

  • Is this a known problem?
  • How would you go about trouble-shooting it?
A: 

Is there a firewall included in the scenario? These tend to drop idle TCP/IP connection after a while giving the behaviour you see.

Thorbjørn Ravn Andersen
Nope, no firewall. If there would have been a firewall dropping idle TCP/IP connections, isn't Tomcat employing some timeout mechanism for closing connections in CLOSE_WAIT state?
knorv
CLOSE_WAIT state is waiting for the other end of the TCP/IP connection to acknowledge that the connection has been closed. Apparently it never does. I would strongly consider figuring out exactly which connections experience this, so you can deduce why.
Thorbjørn Ravn Andersen
A: 

Filip Hanik's "Tomcat Expert Series: Performance Tuning" (2009) is an excellent guide to performance tuning and stability improving settings in Tomcat.

I found the following tips in the guide to be relevant:

  • Overview of tuning options: Threads, Keep-Alives, TCP Backlog (acceptCount), connectionTimeout, socket buffers, connectors: BIO vs. APR vs. NIO (page 24)
  • How to choose between connectors BIO/APR/NIO (page 26-32)
  • Tuning maxThreads (page 33-34)
  • Tuning maxKeepAliveRequests (page 35-36)
  • Tuning acceptCount (page 37-38)
  • Tuning connectionTimeout (page 39-40)
  • Tuning JVM settings (page 44-50)

In the case described here switching to a NIO connector, increasing maxThreads and lowering connectionTimeout might do the trick.

knorv