views:

428

answers:

6

I'm currently looking for a way to identify hanging threads in java. Anyone knows whats the best way to do this?

Currently I could think off 2 possible ways to do this:

  1. Calling a (callback-)method periodically within all methods of the Application. This seems a "bit" complex and unsightly... Moreover I have no control when calling external methods...
  2. An additional Thread which periodically generates Thread-dumps for all Threads (or maybe just for some which should be monitored - as I know which thread I want to monitor) and analyzes the result ("is the thread still at the same point with locks on the same objects" , ...). This could be a bit dangerous as the Thread may again(!) be at the same point... By the way - Is there an easy way to get the Thread dump within Java 1.4 (I don't want to call an external application). I guess with 1.5 or 1.6 there are methods to easily do this...

I guess non of these two methods is a good solution... So do you know a way to do this?

As I said before: I don't want to use any external applications...

+1  A: 
Steve K
I thought that as well but he doesn't want to use any external applications...
Peter D
as i said - i don't want to use any external applications - i now the tools a part of jdk - but nevertheless
dpr
Oh, I missed that part. Why no external applications?
Steve K
The customer has no JDK and I can't force the customer to install an launch an application to check if everything is still fine...
dpr
And this is specific to Java 1.4? Or are they running 1.5 or greater?
Steve K
The solution has to run with 1.4
dpr
A: 

If you are looking for an "automatic" way. There is no good solution except to spend more time in the design process. If you have threads that are "hanging" then they haven't been closed properly. I would spend time looking at how to contextually close them rather than trying to write a routine/thread that looks for stale threads.

Lucas B
for sure that would be the best way - but the thread hangs/blocks in some circumstances when trying to fetch the result from database (as described in http://download.oracle.com/docs/cd/B14117_01/java.101/b10979/tips.htm#i1001430 under 28.3.5 - and as the page says: "Due to limitations in the Java thread API, there is no acceptable workaround" ...). the thread even waits for the response after the database server has closed the connection (also mentioned on oracle.com-site)
dpr
A couple of possibilities come to mind for me.1) I wonder if you can put a timeout on the connect to the database server. If the response time is unknown or could extend for hours then this won't work.2) The other possibility is to put a listener on the thread that gets a message from a timer thread. Upon receiving the event, the thread could test the connection (isAlive, ConnectionStatus, etc...) and if the thread isn't still working then call dispose/close or the like.
Lucas B
A: 

This is just an idea:

Keep in a centralized (synchronized) class information about the starting time of a Thread and a reference to the thread. In example a ConcurrentHashMap (from now CHM).

Past some (conservative) time you can ask to CHM for that Thread (or watever thread you suspect it could be hanged) for it's starting time. If the thread is still alive you can be (conservatively) sure that it's hanged.

You can go further and keep the stacktrace of the threads, as commented before.

ATorras
sounds a bit like #1 of the ways i've posted, but even less secure... if i'm to conservative the thread might hang to long... and in some cases the duration of such a thread (the thread is executing some kind of job) might be very differing... but it would be a quite simple solution... i'll think about it... maybe there's an easy way to make it a bit more "secure"
dpr
Yes, it's based on #1...Another aproach is what is done in OpenView ITO, that keep-alive packets are sent regularly from the agents to the "station", but this approach requires modify the source code...Maybe JMX may help you: http://java.sun.com/javase/6/docs/technotes/guides/jmx/overview/JMXoverviewTOC.html
ATorras
+1  A: 

I've found that when we have threads that are stuck, that we usually end up with a performance problem in our application server.

We have a low tech way of trying to determine where the threads are stuck.

We send several kill -3 signal to the JVM to generate several thread dumps, and then analyse the output looking for similar traces, indicating problematic code.

Low tech and manual, but it worked.

A_M
A: 

The question here is what "hanging thread" means and in what context.

Is inside a big J2EE server (which could probably find out itself) or a tiny Java program where you spawn tons of threads? If it is the latter, consider investigating the concurrent utilities in Java 5+ which allows you to have Runnables and Callables under control.

Thorbjørn Ravn Andersen
A: 

If you have a race condition, then bringing additional resources (i.e. installing a JDK and using jps and jstack) to bear on the problem is surely reasonable...

That said, if you are running an app outside of a container, you can just run the app from the console and hit ctrl+break (Windows) to get the thread dump. Or a kill -3 on *nix.

But based on your answers in comments, it sounds like this is probably running in an app server of some sort.

The problem here is that pretty much anything you do is going to introduce further uncertainty to the situation. How will you know that your deadlock monitor isn't itself responsible for deadlocks?

Another option would be to introduce logging of system inputs, then create a re-play of what happened in the system based on those inputs. If your customer has basically said that they aren't going to help you troubleshoot your race condition, then you need to find it yourself in your own test environment. Emulating the load conditions of the customer site is one way of attempting this.

But really, I recommend that you work with them to get a JDK installed (so you can use jstack), or a JRE 1.4 monitoring application of some sort (heck, make sure that their app server doesn't already provide this sort of thing).

Kevin Day