views:

197

answers:

4

I have an interpreter written in Java. I am trying to test the performance results of various optimisations in the interpreter. To do this I parse the code and then repeatedly run the interpreter over the code, this continues until I get 5 runs which differ by a very small margin (0.1s in the times below), the mean is taken and printed. No I/O or randomness happens in the interpreter. If I run the interpreter again I am getting different run times:

91.8s
95.7s
93.8s
97.6s
94.6s
94.6s
107.4s

I have tried to no avail the server and client VM, the serial and parallel gc, large tables and windows and linux. These are on 1.6.0_14 JVM. The computer has no processes running in the background. So I asking what may be causing these large variations or how can I find out what is?


The actualy issue was caused because the program had to iterate to a fixed point solution and the values were stored in a hashset. The hashed values differed between runs, resulting in a different ordering which in turn led to a change in the amount of iterations needed to reach the solution.

+16  A: 

"Wall clock time" is rarely a good measurement for benchmarking. A modern OS is extremely unlikely to "[have] no processes running in the background" -- for all you know, it could be writing dirty block buffers to disk, because it's decided that there's no other contention.

Instead, I recommend using ThreadMXBean to track actual CPU consumption.

kdgregory
+1 JMX Is the way to go.
Tom
I may have been slightly exaggerating with "no processes". However, any process that is running is unlikely to interfere and yet allow 5 runs in a row that only differ by 0.1s. The machine does have two cores.
Molehill
In terms of abstraction, I'd say Java is way too far from the hardware to expect consistent results... you just don't have enough control.
Dolph
+1  A: 

The garbage collection may be responsible. Even though your logic is the same, it may be that the GC logic is being scheduled on external clock/events.

But I don't know that much about JVMs GC implementation.

Aiden Bell
add -verbose:gc to the command line args if you want to see the output
akf
+4  A: 

Your variations don't look that large. It's simply the nature of the beast that there are other things running outside of your direct control, both in the OS and the JVM, and you're not likely to get exact results.

Things that could affect runtime:

  • if your test runs are creating objects (may be invisible to you, within library calls, etc) then your repeats may trigger a GC

  • Different GC algorithms, specifications will react differently, different thresholds for incremental gc. You could try to run a System.gc() before every run, although the JVM is not guaranteed to GC when you call that (although it always has when I've played with it).T Depending on the size of your test, and how many iterations you're running, this may be an unpleasantly (and nearly uselessly) slow thing to wait for.

  • Are you doing any sort of randomization within your tests? e.g. if you're testing integers, values < |128| may be handled slightly differently in memory.

Ultimately I don't think it's possible to get an exact figure, probably the best you can do is an average figure around the cluster of results.

Steve B.
Before each run I execute "System.runFinalization(); System.gc();" and continue until free memory stays static and there are no pending objects.
Molehill
+1  A: 

This seems like a significant variation to me, I would try running with -verbosegc.

You should be able to get the variation to much less than a second if your process has no IO, output or network of any significance.

I suggest profiling your application, there is highly likely to be significant saving if you haven't done this already.

Peter Lawrey