tags:

views:

61

answers:

2

When the Java application is hanging, you don't even know the use case that is leading to this and want to investigate, I understand that thread dumps can be useful.

But how can we easily derive useful data from the thread dumps to find where the problem is? The server application that I've been working with produces very long thread dumps, because it is an EJB architecture and thread dumps contains many container threads that I'm not sure I should be looking at (i.e. threads that are not running my application code, but JBoss's code).

Yesterday I tried the Thread Dump Analyzer tool. The tool is definitely better than looking at the raw thread dumps in a text editor, because you can filter out threads that you're not interested in, see the thread list, click on a thread to see its details, compare thread dumps to find long running threads, etc. See screenshot below:

Thread Dump Analyzer

But there's still too much data to analyse - almost 300 threads. I don't know of any criteria that I could use to filter out all the JBoss threads, in which I'm not interested. I'm not sure if I should be looking at threads that are currently in "runnable" state only or if "waiting on condition" and "in Object.wait" are also important.

What's the approach that you would normally follow and tools that you would in general use?

+3  A: 

I'm not sure if I should be looking at threads that are currently in "runnable" state only or if "waiting on condition" and "in Object.wait" are also important.

The latter two are actually the things to look for when diagnosing a deadlock, as you seem to be doing. "Runnable" means the thread is doing something right now (or waiting to get the CPU). "blocked" and "waiting" is what deadlocks are made of.

Of course, an application container will have plenty of threads waiting legitimately. To filter out the interesting cases, look at the stack trace. If it's framework classes (and especially ones called "Worker" or "Queue") it's probably OK. If it's application code, you should look at it more closely.

Michael Borgwardt
+5  A: 

One set of thread dumps alone wont be too helpful to get to the root cause.

The trick is to take 4 or 5 sets of thread dumps at an interval of 5 seconds between each. so at the end you will have a single log file which has around 20 - 25 seconds worth of action on the app server.

What you want to check is a stuck thread or long running transaction is happening, all the thread dumps will show a certain thread id is at the same line in your java stack trace. In simpler terms, the transaction (say in an EJB or database) is spanning across multiple thread dumps and hence needs more investigation.

Now when you run these through Samurai (I havent used TDA myself), it will highlight these in Red colour so you can quickly click on it and get to the lines showing issues.

See an example of this here. Look at the Samurai output image in that link. Green is fine. Red and grey need looking at.

A Samurai example from my own web app below shows a stuck sequence for Thread'19' across a span of 5 - 10 seconds

>     Thread dump 2/3 "[ACTIVE] ExecuteThread: '19' for queue:
> 'weblogic.kernel.Default
> (self-tuning)'" daemon prio=7
> tid=07b06000 nid=108 lwp_id=222813
> waiting for monitor entry
> [2aa40000..2aa40b30]     
> java.lang.Thread.State: BLOCKED (on
> object monitor)      at
> com.bea.p13n.util.lease.JDBCLeaseManager.renewLease(JDBCLeaseManager.java:393)
> - waiting to lock <735e9f88> (a com.bea.p13n.util.lease.JDBCLeaseManager)
> at
> com.bea.p13n.util.lease.Lease$LeaseTimer.timerExpired(Lease.java:229)

...

> Thread dump 3/3 "[ACTIVE]
> ExecuteThread: '19' for queue:
> 'weblogic.kernel.Default
> (self-tuning)'"   daemon prio=7
> tid=07b06000 nid=108 lwp_id=222813
> waiting for monitor entry
> [2aa40000..2aa40b30]     
> java.lang.Thread.State: BLOCKED (on
> object monitor)      at
> com.bea.p13n.util.lease.JDBCLeaseManager.renewLease(JDBCLeaseManager.java:393)
> - waiting to lock <735e9f88> (a com.bea.p13n.util.lease.JDBCLeaseManager)
> at
> com.bea.p13n.util.lease.Lease$LeaseTimer.timerExpired(Lease.java:229)
JoseK