views:

73

answers:

1

Hi all,

Update: This looks like a memory issue. A 3.8 Gb Hprof file indicated that the JVM was dumping-its-heap when this "blocking" occurred. Our operations team saw that the site wasn't responding, took a stack trace, then shut down the instance. I believe they shut down the site before the heap dump finished. The log had no errors/exceptions/evidence of problems--probably because the JVM was killed before it could generate an error message.

Original Question We had a recent situation where the application appeared --to the end user--to hang. We got a stack trace before the application restart and I found some surprising results: of 527 threads, 463 had thread state BLOCKED.

In the Past In the past blocked thread usually had this issue: 1) some obvious bottleneck: e.g. some database record lock or file system lock problem which caused other threads to wait. 2) All blocked threads would block on the same class/method (e.g. the jdbc or file system clases)

Unusual Data In this case, I see all sorts of classes/methods blocked, including jvm internal classes, jboss classes, log4j, etc, in addition to application classes (including jdbc and lucene calls)

The question what would cause a JVM to block log4j.Hierarchy.getLogger, java.lang.reflect.Constructor.newInstance? Obviously some resource "is scarce" but which resource?

thanks

will

Stack Trace Excerpts

http-0.0.0.0-80-417" daemon prio=6 tid=0x000000000f6f1800 nid=0x1a00 waiting for monitor entry [0x000000002dd5d000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at sun.reflect.GeneratedConstructorAccessor68.newInstance(Unknown Source)
                at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
                at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
                at java.lang.Class.newInstance0(Class.java:355)
                at java.lang.Class.newInstance(Class.java:308)
                at org.jboss.ejb.Container.createBeanClassInstance(Container.java:630)

http-0.0.0.0-80-451" daemon prio=6 tid=0x000000000f184800 nid=0x14d4 waiting for monitor entry [0x000000003843d000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at java.lang.Class.getDeclaredMethods0(Native Method)
                at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
                at java.lang.Class.getMethod0(Class.java:2670)

"http-0.0.0.0-80-449" daemon prio=6 tid=0x000000000f17d000 nid=0x2240 waiting for monitor entry [0x000000002fa5f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.register(Http11Protocol.java:638)
                - waiting to lock <0x00000007067515e8> (a org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler)
                at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.createProcessor(Http11Protocol.java:630)


"http-0.0.0.0-80-439" daemon prio=6 tid=0x000000000f701800 nid=0x1ed8 waiting for monitor entry [0x000000002f35b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at org.apache.log4j.Hierarchy.getLogger(Hierarchy.java:261)
                at org.apache.log4j.Hierarchy.getLogger(Hierarchy.java:242)
                at org.apache.log4j.LogManager.getLogger(LogManager.java:198)
+2  A: 

These are listed roughly in the order I would try them, depending on the evidence collected:

  • Have you looked at GC behavior? Are you under memory pressure? That could result in newInstance() and a few others above being blocked. Run your VM with +PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc and log the output. Are you seeing excessive GC times near the time of failure/lockup?
    • Is the condition repeatable? If so, try with varying heap sizes in the JVM (-Xmx) and see if the behavior changes substantially. If so, look for memory leaks or properly size the heap for your app.
    • If the previous is tough, and you're not getting an OutOfMemoryError when you should, you can tune the GC tunables... see JDK6.0 XX options, or JDK6.0 GC Tuning Whitepaper. Look specifically at -XX:+UseGCOverheadLimit and -XX:+GCTimeLimit and related options. (note these are not well documented, but may be useful...)
  • Might there be a deadlock? With only stack trace excerpts, can't determine here. Look for cycles amongst the monitor states that threads are blocked on (vs. what they hold). I believe jconsole can do this for you ... (yep, under the threads tab, "detect deadlocks")
  • Try doing several repeated stacktraces and look for what changes vs. what stays the same...
  • Do the forensics... for each stack entry that says "BLOCKED", go look up the specific line of code and figure out whether there is a monitor there or not. If there's an actual monitor acquisition, it should be fairly easy to identify the limiting resource. However, some of your threads may show blocked without a transparently available monitor, these will be trickier...
andersoj
It looks like we had an of OutOfMemoryException and that the jvm had started to dump-its-memory (i.e. HeapDumpOnOutOfMemoryError), but that operations had killed the jvm before it finished. The log has no exceptions or OutOfMemory errors....
@user331465: Yeah, well, it may take some time before an OOM condition actually triggers OOM error. See the XX options for the JVM on UseGCOverheadLimit, GCTimeLimit, GCHeapFreeLimit. Probably this is part of your issue -- threads allocating memory were blocked not because of a lock per se, but because they were waiting to allocate memory.
andersoj
@user331465: suggest you amend your question to include this information about GC/memory issues, as it focuses the question substantially.
andersoj