views:

336

answers:

5

While looking at a micro-optimization question that I asked yesterday (here), I found something strange: an or statement in Java is running slightly faster than looking up a boolean value in an array of booleans.

In my tests, running the below algorithms on long values from 0 to 1 billion, alg1 is about 2% faster. (I have altered the order in which the algorithms are tested, and I get the same results). My question is: Why is alg1 faster? I would have expected alg2 to be slightly faster since it uses a lookup table, whereas alg1 has to execute 4 comparisons and 3 or operations for 75% of inputs.

private final static boolean alg1(long n)
{
  int h = (int)(n & 0xF);
  if(h == 0 || h == 1 || h == 4 || h == 9)
  {
    long tst = (long)Math.sqrt(n);
    return tst*tst == n;
  }  
  return false;

}

private final static boolean[] lookup = new boolean[16];
static
{
  lookup[0] = lookup[1] = lookup[4] = lookup[9] = true;
}
private final static boolean alg2(long n)
{
  if(lookup[(int)(n & 0xF)])
  {
    long tst = (long)Math.sqrt(n);
    return tst*tst == n;
  }
  else
    return false;
}

If you're curious, this code is testing if a number is a perfect square, and utilizes the fact that perfect squares must end in 0, 1, 4, or 9 in hex.

+1  A: 

According to this article accessing array elements are "2 or 3 times as expensive as accessing non-array elements". Your test shows that the difference may be even bigger.

asalamon74
+3  A: 

I would guess that the issues is that range checking for the array and if the array lookup is implemented as a method call. That would certainly overshadow 4 straight int compares. Have you looked at the byte code?

plinth
+4  A: 

Loading some random piece of data is generally slower than a little non-branching code.

It all depends upon processor architecture, of course. Your first if statement could be implemented as four instructions. The second may potentially need null pointer checking, bounds checking as well as the load and compare. Also more code means more compile time, and more chance for the optimisation to be impeeded in some manner.

Tom Hawtin - tackline
A simple lookup at the processor level is generally quite a bit faster than even a small string of calculation. I'm thinking it's the bounds checking/other overhead that Java adds to manage the process.
Brian Knoblauch
bounds checking FTL!
John Gardner
A: 

It's an interesting piece of code, but 2% is a really small difference. I don't think you can conclude very much from that.

Mike Dunlavey
yeah, it's not significant enough that i'm going to change the way i write code or anything... i was just curious, on an intellectual level, why this might be the case.
Kip
A: 

In the current example, I agree that bounds checking is probably what's getting you (why the JVM doesn't optimize this out is beyond me - the sample code could can deterministically be shown to not overflow...

Another possibility (especially with bigger lookup tables) is cache latency... It depends on the size of the processors' registers and how the JVM chooses to use them - but if the byte array isn't kept totally on processor, then you'll see a performance hit compared to a simple OR as the array is pulled onto the CPU for each check.

Kevin Day