views:

180

answers:

9

I'm working on a Sudoku solver at school and we're having a little performance contest. Right now, my algorithm is pretty fast on the first run (about 2.5ms), but even faster when I solve the same puzzle 10 000 times (about 0.5ms for each run). Those timing, of course, depend of the puzzle being solved. I know the JVM do some optimization when a method is called multiple time, and this is what I suspect is happening.

I don't think I can further optimize the algorithm itself (though I'll keep looking), so I was wondering if I could replicate some of the optimizations done by the JVM.

Note : compiling to native code is not an option

Thanks!

Edit : All those VM options are nice, but are not really "legal" in an algorithm contest, since everyone could use those options and get the performance boost. I'm looking for code optimization.

+1  A: 

You might try to reduce the available heap memory so the garbage collector is faster warmed up. It takes a while before memory management is running in "normal" conditions and reducing the memory size can speed it up.

Peter Tillemans
I've tried setting the intial heap memory using -Xms. The lowest I can go is 5mb, and according to the Runtime class, my solver uses about 500kb. It did not change the amount of time first run takes.
Subb
It is especially the -Xmx which is important. And indeed it will probably not impact the first run, but hopefully terminal velocity does not require 10000 runs anymore.
Peter Tillemans
A: 

You can force garbage collection before "heavy" actions, like the main loop, by calling `System.gc();'

Little Bobby Tables
Note that this is only a hint, which the GC may freely disregard.
Péter Török
Calling System.gc() may also totally screw your performance. At least in the current Sun JVM it triggers a full GC, which when done enough times per second will spend more time in the GC than anywhere else (and since the OP said a run takes about 2.5ms, this would be such a case)
Durandal
+2  A: 

If you can have a "warmup" game to get the JIT to compile the game classes, you can add the options

-XX:CompileThreshold=20

To be sure that most hotspots are compiled on your first run. Subsequent runs will then run without any additional (or very little) compilation.

See

mdma
Interesting link. Thanks!
Subb
Interesting behavior too. This actually increased the timing when running the solver once, since it takes to compile those methods. I found a sweet spot around 500, which decrease the time of about 0.5ms but this is probably specific to this exact puzzle.
Subb
A: 

The JVM works with a JIT (Just-In-Time) compiler: it compiles Java bytecode to native machine code on the fly. It does a lot of sophisticated analysis to determine when and what to compile to native code. One of the things it looks at is code that is run more than once - if you run some piece of code 10.000 times, as you're doing, the JVM most likely decides that it's advantageous to compile it to native code and then it runs that every time the code is executed again.

As far as I know, there is no way you can control when the JVM does this. (edit: might not be true, as mdma points out in his answer - there are some advanced -XX options).

Sun has two versions of the JVM: the client JVM, which is tuned to start up fast but do less aggressive optimizations, and the server JVM, which has a slower startup time but which does more aggressive optimizations.

You could try running your program with the server JVM:

java -server com.mypackage.MyProgram

Note that, as far as I know, only the 64-bit version of Sun Java for Windows includes the server JVM.

Jesper
+3  A: 

There is a bunch of microoptimizations that you can apply to your code. This Article of JavaWorld gives an overview. More ideas can be found on this Page.

I wouldn't expect any large gains from it, though. If you can't optimize your algorytms any more, try to optimize your data structures for easy access and to exploit cache locality.

Durandal
Most of these optimization links are very out of date. The JavaWorld article is from 1997 - it's almost irrelevant given the present state of java static and dynamic compilation.
mdma
Yes some are out of date. However, general optimizations like loop invariant motion etc. will never be outdated. Especially since Java relies on the JIT which is more constrainted on how much code analysis is practical compared to static compilers.
Durandal
A: 

You should inspect the bytecode generated by the compiler and check if there are some areas where it isn't able to optimize as well as it should. Otherwise it will be a hit and miss operation.

I do recommend double checking your algorithm before too intensively searching for other optimizations ;)

ponzao
A: 

I know it's not what you're looking for, but I would dispute the methodology of the contest (assuming that's what it is). All major benchmarks (SPEC, etc..) clearly factor in a warmup period (often several minutes) to ensure that the workload itself is being measured, rather than temporary effects like JIT compilation, etc...

Also, note that something that runs in 2.5ms is meaningless in a bench - many JVMs on OSes like Windows don't measure time that well using things like System.currentTimeMillis(). You're going to need to repeat it many times to be meaningful... I would suggest in the 30s to 60s range to become reasonably statistically relevant.

As to the question as asked - you could go down that path, but it's simply not worth it. FYI your biggest wins will be inlining, devirtualization, and escape analysis. All remove significant path length and potentially memory bandwidth challenges.

Trent Gray-Donald
I'm using System.nanoTime(). Is it better?
Subb
Yes, nanoTime() is certainly better, but the point remains: 2.5ms is too short. Imagine for a moment you get an ill-timed context switch on the box and preempt the worker thread... your numbers are going to be way out and not statistically stable. Or on a JVM that does async JIT compilation (eg: IBM JVM), you may end up with a race where the JIT sometimes does or does not deliver the perf boost in time...
Trent Gray-Donald
A: 

Well, some tricks you could try are:

1) Strengt reduction, e.g. instead of dividing by 2, you could just use a right shift

int i = 10 / 2;  // division by 2
i     = 10 >> 1; // does the same

2) Use primitive types instead of reference types, e.g. int over Integer

3) Prefer arrays to Collections

4) Use compound statements, e.g. i += 2 instead of i = i + 2

5) Count loops down instead of up, e.g.

for (int i = 10; i >= 0; i--) {
    System.out.println(i);
}

(only useful if you can compare against zero)

6) Use the final keyword on instance variables if possible

Helper Method
+1  A: 

tl;dr: no, most optimizations done by the JVM can't be expressed in Java byte code.

By its very nature Java byte code is defines your code on a very high level.

That is by design and is one of the reason why the JVM can do all that optimizations: byte code describes the operations on a relatively high level and leaves the actual execution details to the JVM.

Due to that fact most optimizations can't be expressed in byte code itself.

For example the JVM spec specifies that every array access that exceeds the bounds of the array must throw a ArrayIndexOutOfBoundsException (see VM Spec 2.5.14).

However, a smart JVM can optimize away many of those checks if it knows that they already happend (for example when iterating over an array, then the JVM can actually execute the bounds-check before the iteration and skip it during each iteration).

This optimization simply can't be expressed with byte code, because the bounds-check is implicit and can't be avoided in the byte-codes representing array access.

Joachim Sauer
Thanks. That actually answer the original question :)
Subb