Our application has ~10 threads doing separate tasks (no thread pools). We are not experiencing deadlock, but are always trying to lower latency to respond to a request so we are interested in determining which locks are the most contended. jconsole shows how often threads are blocked, and it isn't very often, but we still want to know which locks are the most contended.
We're running using the Sun JVM, so JLA from IBM is not useful, and we aren't running on Solaris so we can't use dTrace.
EDIT: I want to do this observation in production, where a profiler would slow the app unacceptably. This is a trading system, if we are slow, we lose money, so we don't run profilers in production. It is also quite hard to simulate the many exchanges we talk to in a performance test.