ansaurus

Question

Which of these pieces of code is faster in Java?

Answer 1

+21 A:

On many compilers, the machine instructions emitted for a loop going backwards, are more efficient, because testing for zero (and therefore zero'ing a register) is faster than a load immediate of a constant value.

On the other hand, a good optimising compiler should be able to inspect the loop inner and determine that going backwards won't cause any side effects...

BTW, that is a terrible interview question in my opinion. Unless you are talking about a loop which runs 10 millions of times AND you have ascertained that the slight gain is not outweighed by many instances of recreating the forward loop value (n - i), any performance gain will be minimal.

As always, don't micro-optimise without performance benchmarking and at the expense of harder to understand code.

Mitch Wheat 2009-11-01 05:59:21

Yeah, this sort of micro-optimization might have a tiny bit of validity for C or C++, but not for Java.

Michael Myers 2009-11-01 06:05:55

While this is true, the performance gain is so marginal that it's not worth the effort. If anyone told me that I should be using a decrementing for loop due to performance gains then they are trying too hard, therefore I agree this is a terrible interview question.

Brett Ryan 2009-11-01 07:20:21

Answer 2

A:

The answer is a (as you probably found out on the website)

I think the reason is that the i > 0 condition for terminating the loop is faster to test.

pavium 2009-11-01 06:00:15

Answer 3

A:

The bottom line is that for any non-performance critical application, the difference is probably irrelevant. As others have pointed out there are times when using ++i instead of i++ could be faster, however, especially in for loops any modern compiler should optimize that distinction away.

That said, the difference probably has to do with the underlying instructions that get generated for the comparison. Testing if a value is equal to 0 is simply a ~~NAND~~ NOR gate. Whereas testing if a value is equal to an arbitrary constant requires loading that constant into a register, and then comparing the two registers. (This probably would require an extra gate delay or two.) That said, with pipelining and modern ALUs I'd be surprised if the distinction was significant to begin with.

Rob Rolnick 2009-11-01 06:08:21

"Testing if a value is equal to 0 is simply a NAND gate." - One NAND gate is certainly not enough! The fact is that test-for-zero is hardwired into most processors; on x86 any arithmetic instruction sets the zero flag if the result of the operation is zero, which means no comparison instruction is necessary.

Artelius 2009-11-01 06:29:26

Sorry, I meant NOR not NAND. (You're right.) That said, why would one NOR gate (given sufficient inputs) be insufficient? NOR returns 1 if all the inputs are 0, right?

Rob Rolnick 2009-11-01 06:42:52

I don't think 32-input NOR gates are practical. Probably some kind of chaining would be used for a hard-wired system. But then, on modern processors this would probably be done using microcode anyway...

Artelius 2009-11-01 07:22:26

I see, thanks. The courses I took in College didn't go into that kind of detail.

Rob Rolnick 2009-11-01 10:32:57

Answer 4

+17 A:

These kinds of questions are largely an irrelevant distraction that some people get obsessed with it. Call it the Cult of Micro-optimization or whatever you like but is it faster to loop up or down? Seriously? You use whichever is appropriate for what you're doing. You don't write your code around saving two clock cycles or whatever it is.

Let the compiler do what it's for and make you intent clear (both to the compiler and the reader). Another common Java pessimization is:

public final static String BLAH = new StringBuilder().append("This is ").append(3).append(' text").toString();

because excessive concatenation does result in memory fragmentation but for a constant the compiler can (and will) optimize this:

public final static String BLAH = "This is a " + 3 + " test";

where it won't optimize the first and the second is easier to read.

And how about (a>b)?a:b vs Math.max(a,b)? I know I'd rather read the second so I don't really care that the first doesn't incur a function call overhead.

There are a couple of useful things in this list like knowing that a finally block isn't called on System.exit() is potentially useful. Knowing that dividing a float by 0.0 doesn't throw an exception is useful.

But don't bother second-guessing the compiler unless it really matters (and I bet you that 99.99% of the time it doesn't).

cletus 2009-11-01 06:11:03

...but on Gentoo, I have a USE flag to magically reverse all the app's `for` loops and it gains me 218 ips per GHz, baby

Jed Smith 2009-11-01 07:31:05

Are you sure about the Math.max(..) thing? IIRC, JVM's usually optimize away a lot of the Math* - turn things into direct code, instead of method calls, etc - since its non-user-changeable...i.e. Math.max() is - IIRC - actually implemented identically, in any decent JVM/javac combination.

Adam 2009-11-01 14:54:53

@Adam: if you look at the linked site it claims Math.max() is slower. This would be because of function call overhead, boxing/unboxing (although there are versions of max() for the primitive types so I'm not sure if this would actually be the case) or both. Whatever the case, it's micro-optimization.

cletus 2009-11-01 14:56:19

Answer 5

+1 A:

The loops are identical, except for one critical part:

i > 0; and i < 100001;

The greater than zero check is done by checking the NZP (Commonly known as condition code or Negative Zero or Positive bit) bit of the computer.

The NZP bit is set whenever operation such as load, AND, addition ect. are performed.

The greater than check cannot directly utilize this bit (and therefore takes a bit longer...) The general solution is to make one of the values negative (by doing a bitwise NOT and then adding 1) and then adding it to the compared value. If the result is zero, then they're equal. Positive, then the second value (not the neg) is greater. Negative, then the first value (neg) is greater. This check takes a slightly longer than the direct nzp check.

I'm not 100% certain that this is the reason behind it though, but it seems like a possible reason...

Cobalt 2009-11-01 06:27:28

Answer 6

+40 A:

When you get down to the lowest level (machine code but I'll use assembly since it maps one-to-one mostly), the difference between an empty loop decrementing to 0 and one incrementing to 50 (for example) is often along the lines of:

      ld  a,50                ld  a,0
loop: dec a             loop: inc a
      jnz loop                cmp a,50
                              jnz loop

That's because the zero flag im most sane CPUs is set by the decrement instruction when you reach zero. The same can't usually be said for the increment instruction when it reaches 50 (since there's nothing special about that value, unlike zero). So you need to compare the register with 50 to set the zero flag.

However, asking which of the two loops:

for(int i = 100000; i > 0; i--) {}
for(int i = 1; i < 100001; i++) {}

is faster is useless since neither of them does anything useful. The fastest version of both those loops no loop at all. I challenge anyone to come up with a faster version than that :-)

They'll only become useful when you start doing some useful work inside the braces and, at that point, the work will dictate which order you should use.

For example if you need to count from 1 to 100,000, you should use the second loop. That's because the advantage of counting down (if any) is likely to be swamped by the fact that you have to evaluate 100000-i inside the loop every time you need to use it. In assembly terms, that would be the difference between:

     ld  b,100000             dsw a
     sub b,a
     dsw b

(dsw is, of course, the infamous do something with assembler mnemonic).

Since you'll only be taking the hit for an incrementing loop once per iteration, and you'll be taking the hit for the subtraction at least once per iteration (assuming you'll be using i, otherwise there's little need for the loop at all), you should just go with the more natural version.

If you need to count up, count up. If you need to count down, count down.

paxdiablo 2009-11-01 06:27:40

I +1'd as soon as I read `dsw`

Jed Smith 2009-11-01 07:29:28

Drew Hall 2010-01-05 11:55:19

-1 for not answering the question asked at all. The question specifically says, "in Java". What happens in machine code is irrelevant, given how many layers of VM are sitting in between.

Kevin Bourrillion 2010-01-06 18:23:48

Answer 7

+2 A:

With regards for testing for zero in the JVM: it can apparently be done with ifeq whereas testing for anything else requires if_icmpeq which also involves putting an extra value on the stack.

Testing for > 0, as in the question, might be done with ifgt, whereas testing for < 100001 would need if_icmplt.

Artelius 2009-11-01 06:46:29

This is only appropriate while the JVM is interpreting the byte code, once it is optimised to native code, it makes no difference and in the case of an empty loop might to replaces with nothing.

Peter Lawrey 2009-11-01 09:02:48

Even in native code most(?) architectures have an instruction comparing with zero and one or two other ways of comparing with everything else which is a tick or two slower. In theory it will probably be a difference even if I'd say the difference is isn't worth counting and chances are you will have to do other stupid "tricks" inside the loop to just because you're counting the wrong way. Typical micro optimization.

Fredrik 2009-11-01 09:36:33

@Fredrik: Most architectures can test for zero while performing the increment/decrement. So you don't need a compare instruction at all. x86 updates the "zero flag" (among others) as part of any arithmetic instruction, while ARM lets you specify whether you want a particular arithmetic instruction to update the flags. However, this has a much smaller effect than it used to, due to better pipelining and superscalar operation.

Artelius 2009-11-01 22:57:35

@Artelius: I know (even if I disagree it is valid for "most architectures" but I guess that depends on where you draw the line when counting). However, just testing the zero flag is almost always faster than doing that and something else. The fact that you can do both in one instruction doesn't really matter as not all instructions execute in an equal number of clock ticks. Still, it is rather irrelevant and doesn't make much difference in reality.

Fredrik 2009-11-02 12:53:57

Answer 8

+10 A:

On a modern Java implementation this is not true. Summing up the numbers up to one billion as a benchmark:

Java(TM) SE Runtime Environment 1.6.0_05-b13
Java HotSpot(TM) Server VM 10.0-b19
up 1000000000: 1817ms 1.817ns/iteration (sum 499999999500000000)
up 1000000000: 1786ms 1.786ns/iteration (sum 499999999500000000)
up 1000000000: 1778ms 1.778ns/iteration (sum 499999999500000000)
up 1000000000: 1769ms 1.769ns/iteration (sum 499999999500000000)
up 1000000000: 1769ms 1.769ns/iteration (sum 499999999500000000)
up 1000000000: 1766ms 1.766ns/iteration (sum 499999999500000000)
up 1000000000: 1776ms 1.776ns/iteration (sum 499999999500000000)
up 1000000000: 1768ms 1.768ns/iteration (sum 499999999500000000)
up 1000000000: 1771ms 1.771ns/iteration (sum 499999999500000000)
up 1000000000: 1768ms 1.768ns/iteration (sum 499999999500000000)
down 1000000000: 1847ms 1.847ns/iteration (sum 499999999500000000)
down 1000000000: 1842ms 1.842ns/iteration (sum 499999999500000000)
down 1000000000: 1838ms 1.838ns/iteration (sum 499999999500000000)
down 1000000000: 1832ms 1.832ns/iteration (sum 499999999500000000)
down 1000000000: 1842ms 1.842ns/iteration (sum 499999999500000000)
down 1000000000: 1838ms 1.838ns/iteration (sum 499999999500000000)
down 1000000000: 1838ms 1.838ns/iteration (sum 499999999500000000)
down 1000000000: 1847ms 1.847ns/iteration (sum 499999999500000000)
down 1000000000: 1839ms 1.839ns/iteration (sum 499999999500000000)
down 1000000000: 1838ms 1.838ns/iteration (sum 499999999500000000)

Note that the time differences are brittle, small changes somewhere near the loops can turn them around.

Edit: The benchmark loops are

        long sum = 0;
        for (int i = 0; i < limit; i++)
        {
            sum += i;
        }

and

        long sum = 0;
        for (int i = limit - 1; i >= 0; i--)
        {
            sum += i;
        }

Using a sum of type int is about three times faster, but then sum overflows. With BigInteger it is more than 50 times slower:

BigInteger up 1000000000: 105943ms 105.943ns/iteration (sum 499999999500000000)

starblue 2009-11-01 08:08:06

+1. for running a benchmark.

Mitch Wheat 2009-11-01 09:49:21

+1. Nice counter.

pst 2009-11-01 19:35:12

So, to calculate "sum 499999999500000000", did you use longs or BigIntegers? The latter in particular have so much overhead that it will swamp the different loops. Consider that starting at the upper end of the range makes the numbers get really big very early, and since the speed of adding BigIntegers depends on their size, this would make it a very unfair test. Note, I'm not arguing either way about the performance, I'm just saying that benchmarks aren't useful unless you detail your methods, so others can scrutinise them for bias and reproduce the results for themselves.

Artelius 2009-11-01 23:13:11

Answer 9

+2 A:

This is about the dumbest question I have ever seen. The loop body is empty. If the compiler is any good it will just emit no code at all. It doesn't do anything, can't throw an exception and doesn't modify anything outside of its scope.

Assuming your compiler isn't that smart, or that you actually didn't have an empty loop body: The "backwards loop counter" argument makes sense for some assembly languages (it may make sense to the java byte code too, I don't know it specifically). However, the compiler will very often have the ability to transform your loop to use decrementing counters. Unless you have loop body in which the value of i is explicitly used, the compiler can do this transformation. So again you often see no difference.

Paul Hsieh 2009-11-01 08:29:23

Answer 10

+11 A:

A better question is;

Which is easier to understand/work with?

This is far more important than a notional difference in performance. Personally, I would point out that performance shouldn't be the criteria for determining the difference here. If they didn't like me challenging their assumption on this, I wouldn't be unhappy about not getting the job. ;)

Peter Lawrey 2009-11-01 09:06:32

This is the best answer.

Sam152 2009-11-01 09:41:13

Answer 11

+2 A:

Are you sure that the interviewer who asks such a question expects a straight answer ("number one is faster" or "number two is faster"), or if this question is asked to provoke a discussion, as is happening in the answers people are giving here?

In general, it's impossible to say which one is faster, because it heavily depends on the Java compiler, JRE, CPU and other factors. Using one or the other in your program just because you think that one of the two is faster without understanding the details to the lowest level is superstitious programming. And even if one version is faster than the other on your particular environment, then the difference is most likely so small that it's irrelevant.

Write clear code instead of trying to be clever.

Jesper 2009-11-01 09:34:04

In the cited page, the author says that the second is faster, and doesn't provide a reason. Hence, the question.

rball 2010-05-12 23:26:42

Answer 12

+5 A:

Typically real code will run faster counting upwards. There are a few reasons for this:

Processors are optimised for reading memory forwards.
HotSpot (and presumably other bytecode->native compilers) heavily optimise forward loops, but don't bother with backward loops because they happen so infrequently.
Upwards is usually more obvious, and cleaner code is often faster.

So happily doing the right thing will usually be faster. Unnecessary micro-optimisation is evil. I haven't purposefully written backward loops since programming 6502 assembler.

Tom Hawtin - tackline 2009-11-01 12:24:57

Answer 13

+2 A:

Such questions have their base on old best-practice recommendations. It's all about comparison: comparing to 0 is known to be faster. Years ago this might have been seen as quite important. Nowadays, especially with Java, I'd rather let the compiler and the VM do their job and I'd focus on writing code that is easies to maintain and understand.

Unless there are reasons for doing it otherwise. Remember that Java apps don't always run on HotSpot and/or fast hardware.

Marius Burz 2009-11-01 14:45:30

Answer 14

+5 A:

There are really only two ways to answer this question.

To tell you that it really, really doesn't matter, and you're wasting your time even wondering.
To tell you that the only way to know is to run a trustworthy benchmark on your actual production hardware, OS and JRE installation that you care about.

So, I made you a runnable benchmark you could use to try that out here:

http://code.google.com/p/caliper/source/browse/trunk/test/examples/LoopingBackwardsBenchmark.java

This Caliper framework is not really ready for prime time yet, so it may not be totally obvious what to do with this, but if you really care enough you can figure it out. Here are the results it gave on my linux box:

     max benchmark        ns
       2  Forwards         4
       2 Backwards         3
      20  Forwards         9
      20 Backwards        20
    2000  Forwards      1007
    2000 Backwards      1011
20000000  Forwards   9757363
20000000 Backwards  10303707

Does looping backwards look like a win to anyone?

Kevin Bourrillion 2010-01-06 18:14:54

Well totally, what happens if you only loop 2 times?! If you had like 3 of those suckers then you'd save 3ns. 3 freakin nano seconds man! You just are hardcore enough I guess. And yes I'm kidding.

rball 2010-05-12 23:25:42

Link is broken now.

Brian Harris 2010-07-28 16:01:09

fixed. 88888888

Kevin Bourrillion 2010-07-29 07:37:06

ansaurus

tags:

views:

answers:

Which of these pieces of code is faster in Java?

related questions