ansaurus

Question

First time a Java loop is run SLOW, why? [Sun HotSpot 1.5, sparc]

Answer 1

+1 A:

That's an interesting question. I'd suspect the JIT compiler, but these are my numbers:

First                | 1  | 2399.233 ms 
Second               | 1  | 2322.359 ms 
Third                | 1  | 2408.342 ms

Possibly Solaris is doing something funny with threads; have you tried with nThreads = 10 or so?

Michael Myers 2009-05-13 20:34:22

This is a stripped down version of the code, though it exhibits the same behavior. I just wanted to post the simplest possible case here. In other words, I'm not trying to make this specific benchmark as fast as possible, but rather understand what went wrong here so it won't impact my real benchmarking.

Adam Morrison 2009-05-14 09:51:49

Answer 2

A:

It's the hotspot compiler at work. AFAIK, the first time it runs the function, runs "interpreted" and the execution path is analyzed, then the JIT compiler can optimize the subsequent function calls.

GClaramunt 2009-05-13 20:34:59

It should replace the code on the stack midway through (that was introduced in 2000). Perhaps there is something in this example that is preventing it.

Tom Hawtin - tackline 2009-05-13 20:50:13

Answer 3

+2 A:

Add class loading in as a suspect. Classes are loaded lazily on first reference. So the first time the code runs, you're probably referencing some classes for the first time.

John M 2009-05-13 20:35:55

Answer 4

+1 A:

I suggest you make nThread = Runtime.getRuntime().availableProcessors() This will give you the optimal number of threads to use all the cores in your system.

You can try turning off the JIT to see what difference it makes.

Peter Lawrey 2009-05-13 20:43:31

Thanks, I'm not trying to make this specific code fast, but rather trying to understand what is going on behind this scene to make it exhibit this behavior.

Adam Morrison 2009-05-14 09:52:31

When Java first runs code it does so in an interperated mode. It does this as most code is only executed once and its not worth coding code in this situations. When code is used repeatedly it is optimised and can get further optimised based on changes in how the code is used. By default, full optimisation waits until something is called 10,000 times. It does this to collect statistics on how the code is used before optimises the code. e.g. Reducing this number can lead to poorer performance in the long run.

Peter Lawrey 2009-05-14 20:34:07

Answer 5

A:

It's most certainly the hotspot compiler. If you're running on 64 bit solaris it defaults to the server VM and hotspot just start optimizing on first execution. On the client VM the code may need to run a few times before hotspot kicks in. (i believe solaris only has the server vm but I may be wrong)

krosenvold 2009-05-13 20:47:03

Answer 6

+1 A:

You can get the VM to log information about classloading and compilation, try the following VM args: -XX:+PrintCompilation -XX:+TraceClassLoading This might give some further clues as to what's happening under the hood.

EDIT: I'm not sure those options work in java 1.5 (I've used them in 1.6). I'll try to check... EDIT again: It works in java 1.5 (note you need +, not -, or you turn the option off...)

bm212 2009-05-13 20:54:12

Answer 7

A:

See http://java.sun.com/javase/6/docs/technotes/guides/vm/server-class.html for how the launcher selects between client and server VM, and what is supported on the different processors and OSes.

stili 2009-05-13 21:01:38

Answer 8

+2 A:

The best way to verify if the JIT compiler is the reason for the speedup in later iterations is to run the benchmark with the JIT compiler turned off. To do this, specify the system property java.compiler=NONE (the word "none" must be in upper case).

Time spent doing class loading can also cause the benchmarked code to run slower the first time. Finally, there is a nondeterministic delay between calling Thread.start() and the Thread's run() method being called.

You might want to consider finding a benchmark framework. A good framework will "warm up" the code by running several iterations, then do multiple timings with a different number of iterations. See Java theory and practice: Anatomy of a flawed microbenchmark.

NamshubWriter 2009-05-13 21:04:27

Answer 9

+1 A:

I believe you can also use the non-standard option for the java command of -Xint to disable HotSpot and have your code interpreted only. This could at least take HotSpot out of the equation for interpreting your timing.

monceaux 2009-05-13 21:11:31

I tried that, but the code ran for so long I gave up.

Adam Morrison 2009-05-14 09:54:02

Answer 10

+3 A:

Some ugly, unrealistic code (the stuff of microbenchmarks):

                while (counter < 10000000) {
                        // work
                        for (int j = 0; j < 100; j++)
                                counter++;
                        counter -= 99;
                }

So what is this doing and how fast should it run.

The inner loop increments counter 100 times, then the counter is decremented by 99. So an increment of 1. Note counter is a member variable of an outer class, so some overhead there. This is then run 10,000,000 times. So the inner loop is run 1,000,000,000 times.

A loop using to accessor methods, call it 25 cycles. 1,000,000,000 times at 1 GHz, gives 25s.

Hey, we predicted the SLOW time. The slow time is fast. The fast times are after the benchmark has been broken in some way - 2.5 cycles an iteration? Use -server and you might find it gets even more silly.

Tom Hawtin - tackline 2009-05-13 21:17:19

accesor methods can be inlined, eliminating the overhead. loops can be unrooled, reducing the overhead and modern processors have branch prediction and may be able to do several operations at once, so an average of 2,5 cycles per iteration is normal.

ggf31416 2009-05-13 22:13:59

and loops can be removed - easier than inlining virtual methods. so an average of 0.0 cycles per iteration is normal.

Tom Hawtin - tackline 2009-05-13 23:17:06

Tom, thanks. Following your hint I also benchmarked this code in C (verifying the assembly code) and it took on the order of 20 seconds as well. So I attached a debugger to the Java process. Turns out that after the first measure() call, the JIT compiler optimizes out the memory accesses in the inner loop and makes the loop work on registers.

Adam Morrison 2009-05-14 11:31:11

I wouldn't be surprised even if JIT would eventually optimize that code into "count = 10000000;"

Esko Luontola 2009-05-14 12:34:29

Yeah, it's the sort of thing server HotSpot might do. Might well do it if you assign counter to zero. Currently it would have to do something like if (this$0.count < 10000000) this$0.count = 10000000;. Not the best benchmark.

Tom Hawtin - tackline 2009-05-14 13:46:09

Answer 11

+4 A:

Esko Luontola 2009-05-13 21:31:26

For 20 seconds??

Tom Hawtin - tackline 2009-05-13 23:18:43

True, it would be unusual for that to take so long, unless the system was something seriously broken. Tom and Adam found out the real reason.

Esko Luontola 2009-05-14 12:38:47

ansaurus

tags:

views:

answers:

First time a Java loop is run SLOW, why? [Sun HotSpot 1.5, sparc]

related questions