views:

486

answers:

6

Is there any recommended java application profiling tutorial? I am now using jProfiler and eclipse TPTP with my profiling. However, although equipped with wonderful weapons, as a newbie in java profiling, I am still missing the general theory and skill in pinpointing the bottleneck. So would you please provide me with some recommended tutorial for java profiling?

+1  A: 

JProfiler comes with it's help manual. I found that to be very good.

saugata
A: 

Is there some good free or open source profiler available? I have tried jamon, but did not like it.

athena
TPTP is okay if you are using eclipse.
Winston Chen
VisualVM is included in the most recent distributions of the JDK. (In other words, you've probably already got it.)This is the link to their site (where you can also download it separately): https://visualvm.dev.java.net/NetBeans also includes profiling tools.
William Billingsley
+1  A: 

As a newcomer to profiling you should start by simply looking for methods that have a long runtimes and/or are invoked many times during typical usage pattern/where the bottlenecks occur.

I am not sure how the Eclipse integration with JProfiler works, since I primarily use Netbeans, however in netbeans there is a 'Snapshot' view that shows a hierarchy of method invocations with runtimes that sum up to 100%. I look for the parts of the hierarchy that (relatively) take up a large % of the total time. From there you have to think about what those methods are doing, and what could be causing them to be slow.

For example: I noticed that a method that was called frequently was overall taking way too much time to complete, and was a serious bottleneck. Long story short it turns out the code was checking to see if an item was present in a collection using the .contains() method, and the collection was a Linked List. The reason this is a problem is because Linked Lists have O(n) for functions like .contains(). The fix in this case was quite simple as I was able to replace the Linked List with a Hash Set, which performs .contains() much faster ( O(1) ).

instanceofTom
good insight!! thanks
Winston Chen
+6  A: 

Profiling is a subject having more than one school of thought.

The more popular one is that you proceed by getting measurements. That is, you try to see how long each function takes and/or how many times it is called. Clearly, if a function takes very little time, then speeding it up will gain you little. But if it takes a lot of time, then you have to do detective work to figure out what part of the function is responsible for the time. Do not expect function times to add up to total time, because functions call each other, and the reason function A may take a lot of time is that it calls function B that also takes a lot of time.

This approach can find a lot of problems, but it depends on you being a good detective and being able to think clearly about different kinds of time, like wall-clock time versus CPU time, and self-time versus inclusive time. For example, an app may appear to be slow but the function times may be all reported as near zero. This can be caused by the program being IO bound. If the IO is something that you expect, that may be fine, but it may be doing some IO that you don't know about, and then you are back to detective work.

General expectation with profilers is that if you can fix enough things to get a 10% or 20% speedup, that's pretty good, and I never hear stories of profilers being used repeatedly to get speedups of much more than that.

Another approach is not to measure, but to capture. It is based on the idea that, during a time when the program is taking longer (in wall-clock time) than you would like, you want to know what it is doing, predominantly, and one way to find out is to stop it and ask, or take a snapshot of its state and analyze it to understand completely what it is doing and why it is doing it at that particular point in time. If you do this multiple times and you see something that it is trying to do at multiple times, then that activity is something that you could fruitfully optomize. The difference is that you are not asking how much; you are asking what and why. Here's another explanation. (Notice that the speed of taking such a snapshot doesn't matter, because you're not asking about time, you're asking what the program is doing and why.)

In the case of Java, here is one low-tech but highly effective way to do that, or you can use the "pause" button in Eclipse. Another way is to use a particular type of profiler, one that samples the entire call stack, on wall-clock time (not CPU unless you want to be blind to IO), when you want it to sample (e.g. not when waiting for user input), and summarizes at the level of lines of code, not just at the level of functions, and percent of time, not absolute time. To get percent of time, it should tell you, for each line of code that appears on any sample, the percent of samples containing that line, because if you could make that line go away, you would save that percent. (You should ignore other things it tries to tell you about, like call graphs, recursion, and self-time.) There are very few profilers that meet this spec, but one is RotateRight/Zoom, but I'm not sure if it works with Java, and there may be others.

In some cases it may be difficult to get stack samples when you want them, during the time of actual slowness. Then, since what you are after is percentages, you can do anything to the code that makes it easier to get samples without altering the percentages. One way is to amplify the code by wrapping a temporary loop around it of, say, 100 iterations. Another way is, under a debugger, to set a data-change breakpoint. This will cause the code to be interpreted 10-100 times slower than normal. Another way is to employ an alarm-clock timer to go off during the period of slowness, and use it to grab a sample.

With the capturing technique, if you use it repeatedly to find and perform multiple optimizations, you can expect to reach near-optimal performance. In the case of large software, where bottlenecks are more numerous, this can mean substantial factors. People on SO have reported factors from 7x to 60x. Here is a detailed example of 43x.

The capturing technique has trouble with cases where it is hard to figure out why the threads are waiting when they are, such as when waiting for a transaction to complete on another processor. (Measuring has the same problem.) In those cases, I use a laborious method of merging time-stamped logs.

Mike Dunlavey
Note that it doesn't have to be "either/or" - it's often useful to *measure* some "typical test run" non-invasively, then use a profiler as a capturing tool to find hotspots. Make your changes, using the profiler iteratively, and then rerun the measurement tests.
Jon Skeet
@Jon: I do rerun the measurement (typically just a stopwatch) to see how much time I shed. Where I'm a zealot is in the method of finding what to fix, which is seldom a "hotspot" as I define it (a region where the program counter hangs out) or a "bottleneck" (a place where needed work gets crowded up). Typically it is little one-liners (or less) that end up invoking piles of code (including IO) that you never would have guessed when you coded it.
Mike Dunlavey
+1  A: 

You may find the book Java Platform Performance interesting. Published by Sun Microsystems.

Stephen Kellett
+1  A: 

Try this free java performance troubleshooting tool. If your app is running in production, you'll want to use this rather than a profiler because it operates at < 2% overhead and still gives you all the diagnostic details needed to find the root-cause of slow requests, stalls and errors. Download it now for free at www.appdynamics.com/free

Steve Roop