views:

478

answers:

2

In certain well-understood circumstances, our application will open too many sockets (database connections) and reach the maximum open files that the OS allows. We understand this; we are fixing the issue and also bumping up the limit.

What we can't explain is why parts of our application don't recover even after the number of connections abates and we're well within the limit.

In this case, it's an application running under Tomcat.

When this happens, we first start seeing "Too many open files" errors:

SEVERE: Socket accept failed
java.net.SocketException: Too many open files
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
        at java.net.ServerSocket.implAccept(ServerSocket.java:453)
        at java.net.ServerSocket.accept(ServerSocket.java:421)
        at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:61)
        at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:310)
        at java.lang.Thread.run(Thread.java:619)

Eventually, we start seeing NoClassDefFoundErrors inside an application thread that's trying to open HTTP connections:

java.lang.NoClassDefFoundError: org/apache/commons/httpclient/protocol/ControllerThreadSocketFactory
        at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:128)
        at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
        at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1349)
       [...]
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory
        at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1387)
        at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
        ... 8 more

When the errant connections go away, the server starts accepting connections again, and everything seems ok, but we're left with the latter error constantly being spewed to stderr.

Although the application typically logs unloaded classes to stdout, I don't see any such logs just before, during or after the "Too many open files" errors.

My initial theory was that the Hotspot JVM would unload seemingly unused classes when it encounters "Too many open files," but if so, it doesn't log the fact.

Edit: As Stephen C indicates below, if it is unloading the class, and encounters an error the first time it reloads, that could explain why it never recovers. I think that's a good working theory. Is it documented in the Sun docs? Why would it not log that the class is being unloaded the way unloading a class usually is?

Platform details:

Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)

Apache Tomcat Version 6.0.18
+1  A: 

I think that the reason you are getting repeated ClassNotFoundExceptions is that the first attempted class initialization of ControllerThreadSocketFactory failed due to the Socket leakage problem. Your code is now repeatedly doing things that is retriggering class initialization for the class, and they are reporting the original problem.

If a class initialization fails first time, that's it. The JVM will not try to do it again.

Stephen C
The ControllerThreadSocketFactory is initialized and we're establishing HTTP connections for some time before the first error. In the typical case, the app runs for 12+ hours before the database connection stampede that causes the file descriptor exhaustion.Then we get the ClassNotFound exception, the HTTP connections can no longer be established, and it never recovers. It does seem as though the JVM is unloading the class, and then can not reload it.
Michael
@Michael ... and the reason it cannot reload it is as stated in my answer.
Stephen C
So you also think the JVM is unloading the class when it encounters the error? Is that documented somewhere?
Michael
@Michael - I don't know. It might be that Tomcat is killing and restarting your webapp ... but I'm just guessing.
Stephen C
A: 

Facing the same issue using Weblogic 8.1 / JRockIt R27.2 and a bunch of webapps that tries to load resourcebundles and then fails due to the limit on the number of open files. Stopping and starting the application (i.e. unloading and loading classloaders) make things works again.

MenezesDNS