views:

1116

answers:

12

I have a program I ported from C to Java. Both apps use quicksort to order some partitioned data (genomic coordinates).

The Java version runs fast, but I'd like to get it closer to the C version. I am using the Sun JDK v6u14.

Obviously I can't get parity with the C application, but I'd like to learn what I can do to eke out as much performance as reasonably possible (within the limits of the environment).

What sorts of things can I do to test performance of different parts of the application, memory usage, etc.? What would I do, specifically?

Also, what tricks can I implement (in general) to change the properties and organization of my classes and variables, reducing memory usage and improving speed?

EDIT : I am using Eclipse and would obviously prefer free options for any third-party tools. Thanks!

+5  A: 

Obviously, profile profile profile. For Eclipse there's TPTP. Here's an article on the TPTP plugin for Eclipse. Netbeans has its own profiler. jvisualvm is nice as a standalone tool. (The entire dev.java.net server seems to be down at the moment, but it is very much an active project.)

The first thing to do is use the library sorting routine, Collections.sort; this will require your data objects to be Comparable. This might be fast enough and will definitely provide a good baseline.

General tips:

  • Avoid locks you don't need (your JVM may have already optimized these away)
  • Use StringBuilder (not StringBuffer because of that lock thing I just mentioned) instead of concatenating String objects
  • Make anything you can final; if possible, make your classes completely immutable
  • If you aren't changing the value of a variable in a loop, try hoisting it out and see if it makes a difference (the JVM may have already done this for you)
  • Try to work on an ArrayList (or even an array) so the memory you're accessing is contiguous instead of potentially fragmented the way it might be with a LinkedList
  • Quicksort can be parallelized; consider doing that (see quicksort parallelization)
  • Reduce the visibility and live time of your data as much as possible (but don't contort your algorithm to do it unless profiling shows it is a big win)
Hank Gay
when escape analysis is in place and 100% correctly working, StringBuilder has the same performance characteristics as StringBuffer. haven't tried it under j6u14 yet. so you should not worry about this too heavily.
Andreas Petersson
Concatenation of plain Strings is implemented by means of StringBuilder and optimized in many cases. That kind of micro-optimization is what made some of use use StringBuffer-s in the past just to realize that with modern VMs plain String concatenation is faster than our hand-tailored... who will refactor those StringBuffer to StringBuilder now? That is one of the examples of trying to outsmart the compiler/VM.
David Rodríguez - dribeas
Yes, switching from String concatenation to using a StringBuilder brought a huge performance increase in a few JSPs I was rendering. Good Point.
Kieveli
@Andreas: No use in hoping the JVM can figure out what you meant (I don't need "thread-safe" appends) when there is already a class that makes that intent explicit. StringBuffer won't handle a fair amount of the locking that a truly thread-safe program is going to need anyway (ordering issues, mostly).@dribeas: I realize that is the theory, but any number of things can cause the JVM to not perform that optimization. Also, StringBuffer -> StringBuilder is a trivial refactoring in any case where it is valid.
Hank Gay
+3  A: 

Use a profiler:

Use the latest version of JVM from your provider. Incidentally Sun's Java 6 update 14 does bring performance improvements.

Measure your GC throughput and pick the best garbage collector for your workload.

Robert Munteanu
The link to visualvm seems to be dead. Is it still maintained?
Alex Reynolds
Yes, and the last Java update - Java 6 update 14 - brought in improvements to it. It's very much alive.
Robert Munteanu
All of dev.java.net seems to be down at the moment, so this is why the link is down.
Robert Munteanu
Don't forget the built in profiler, hprof ;)
Peter Lawrey
@Peter Lawrey - included hprof, thanks :-)
Robert Munteanu
+14  A: 

do not try to outsmart the jvm.

in particular:

  • don't try to avoid object creation for the sake of performance

  • use immutable objects where applicable.

  • use the scope of your objects correctly, so that the GC can do its job.

  • use primitives where you mean primitives (e.g. non-nullable int compared to nullable Integer)

  • use the built-in algorithms and data structures

  • when handing concurrency use java.util.concurrent package.

  • correctness over performance. first get it right, then measure, then measure with a profiler then optimize.

Andreas Petersson
Can you explain what this means? I don't know if I'm doing anything to outsmart the JVM.
Alex Reynolds
While many of your advices may be true, I'd boldly disagree with the first one. Reducing object creation was the most crutial part to optimization in many of my projects. Of course, it's no use to save 10 big objects, but often you end up with millions of small objects, and not creating them is important.
Brian Schimmel
Andreas is largely correct, object allocation is extremely quick in modern VM's (a lot faster than C++ for example) and assuming they don't live long they'll be cleaned up in a minor collection. Having said that, profile, profile some more, only do things that you have evidence for, and not just because somebody said so.
Gareth Davis
Outsmarting the VM means trying not trying to improve performance, by, for example, avoiding object creation. Leave these kinds of optimisations to the compiler and VM.Afterwards, perhaps connect a profiler such as VisualVM/JProfiler to see which parts of your code are most affecting your program's performance, then concentrate on improving them.
Rich
That first sentence should read "Outsmarting the VM means trying not to improve performance, by, for example, avoiding object creation".
Rich
A profiler will tell you if object creation is a bottleneck. Integer.valueOf(int) is worth using instead of new Integer(int), for example, but this is a rare case. Inappropriate caching of objects, leading to them surviving long enough to be promoted out of Eden space, will result in a performance hit.
Bill Michell
@Brian: In Java 1.4 memory allocation for an object took less than 10 processor instructions. The magic with movable generational GC is that free memory is always contiguous, allocating 10 bytes is just returning the current free pointer and incrementing it by 10. Conversely, reusing old objects holding references to possibly newer objects will impact performance (possibly forcing memory moves). That is 'trying to outsmart the VM'. Depending on your object definitions and the code you use, you can improve or worsen performance.
David Rodríguez - dribeas
The first point should read: "don't try to avoid object creation for the sake of performance unless profiling tells you that you need to".
quant_dev
@quant_dev: this is what the last point (last resort) points toward
Andreas Petersson
@dribes That always seems a rather dubious boast, since some processor instructions ( such as lock cmpxchg ) take tens of cycles, so 10 instructions could be 10 cycles or 400. Secondly that is memory allocation - the JVM also initialises all values in objects and arrays, and for data intensive applications initialising an array to 0 then again to useful values doubles the run time compared to C.
Pete Kirkham
A: 

Methodolically, you have to profile the application and then get an idea of what components of your program are time and memory-intensive: then take a closer look to that components, in order to improve their performances (see Amdahl's law).

From a pure technological POV, you can use some java-to-nativecode compilers, like Excelsior's jet, but I've to note that recent JVM are really fast, so the VM should not impact in a significative manner.

akappa
Okay, but what tools would I actually use to this end?
Alex Reynolds
Pick a profiler and use it. I've used jprofiler: it is good, but it costs money.
akappa
I've also used eclipse's TPTP, but its capabilities are poor compared to what jprofiler has to offer to you.
akappa
TPTP is very hard to set up correctly. I've tried more than once to use it with Eclipse 3.4.x and failed. JProfiler "just works".
quant_dev
+1  A: 

If your algorithm is CPU-heavy, you may want to consider taking advantage of parallelisation. You may be able to sort in multiple threads and merge the results back later.

This is however not a decision to be taken lightly, as writing concurrent code is hard.

Simon Nickerson
+1  A: 

Can't you use the sort functions that are included in the Java library?

You could at least look at the speed difference between the two sorting functions.

Peter Stuifzand
The comparator I am using is customized to deal with ordering a genomic and positional data structure.
Alex Reynolds
@Alex If you make your data objects implement [Comparable](http://java.sun.com/javase/6/docs/api/java/lang/Comparable.html) you can still use the library sort.
Hank Gay
@Hank: why can't @Alex use the overloaded sort method that takes a Comparator?
Hemal Pandya
@Hemal The code is cleaner when you use the natural sort: no need to create a comparator and fewer arguments to pass. Of course, if Alex's sort criteria don't make sense as the natural order, the Comparator version is the way to go.
Hank Gay
+2  A: 

jvisualvm ships with JDK 6 now - that's the reason the link cited above doesn't work. Just type "jvisualvm <pid>", where <pid> is the ID of the process you want to track. You'll get to see how the heap is being used, but you won't see what's filling it up.

If it's a long-running process, you can turn on the -server option when you run. There are a lot of tuning options available to you; that's just one.

duffymo
+3  A: 

Also try tweaking the runtime arguments of the VM - the latest release of the VM for example includes the following flag which can improve performance in certain scenarios.

-XX:+DoEscapeAnalysis
Rich
A: 

Is your sorting code executing only once, e.g. in a commandline utility that just sorts, or multiple times, e.g. a webapp that sorts in response to some user input?

Chances are that performance would increase significantly after the code has been executed a few times because the HotSpot VM may optimize aggressively if it decides your code is a hotspot.

This is a big advantage compared to C/C++.

The VM, at runtime, optimizes code that is used often, and it does that quite well. Performance can actually rise beyond that of C/C++ because of this. Really. ;)

Your custom Comparator could be a place for optimization, though.

Try to check inexpensive stuff first (e.g. int comparison) before more expensive stuff (e.g. String comparison). I'm not sure if those tips apply because I don't know your Comparator.

Use either Collections.sort(list, comparator) or Arrays.sort(array, comparator). The array variant will be a bit faster, see the respective documentation.

As Andreas said before: don't try to outsmart the VM.

Huxi
+2  A: 

Don't optimize prematurely.

Measure performance, then optimize.

Use final variables whenever possible. It will not only allow JVM to optimize more, but also make your code easier to read and maintain.

If you make your objects immutable, you don't have to clone them.

Optimize by changing the algorithm first, then by changing the implementation.

Sometimes you need to resort to old-style techniques, like loop unrolling or caching precalculated values. Remember about them, even if they don't look nice, they can be useful.

quant_dev
A: 

Perhaps there are other routes to performance enhancement other than micro-optimization of code. How about a different algorithm to achieve what you wanted your program to do? May be a different data structure?

Or trade some disk/ram space for speed, or if you can give up some time upfront during the loading of your program, you can precompute lookup tables instead of doing calculations - that way, the processing is fast. I.e., make some trade-offs of other resources available.

Chii
A: 

Here's what I would do, in any language. If samples show that your sort-comparison routine is active a large percentage of the time, you might find a way to simplify it. But maybe the time is going elsewhere. Diagnose first, to see what's broken, before you fix anything. Chances are, if you fix the biggest thing, then something else will be the biggest thing, and so on, until you've really gotten a pretty good speedup.

Mike Dunlavey