ansaurus

Question

Is it faster to count down than it is to count up?

Answer 1

+20 A:

Here's what might happen on some hardware depending on what the compiler can deduce about the range of the numbers you're using: with the incrementing loop you have to test i<N each time round the loop. For the decrementing version, the carry flag (set as a side effect of the subtraction) may automatically tell you if i>=0. That saves a test per time round the loop.

In reality, on modern pipelined processor hardware, this stuff is almost certainly irrelevant as there isn't a simple 1-1 mapping from instructions to clock cycles. (Though I could imagine it coming up if you were doing things like generating precisely timed video signals from a microcontroller. But then you'd write in assembly language anyway.)

2010-05-12 22:01:22

wouldn't that be the zero flag and not the carry flag?

Bob 2010-05-12 22:23:49

@Bob In this case you might want to reach zero, print a result, decrement further, and then find you've gone one below zero causing a carry (or borrow). But written slightly differently a decrementing loop might use the zero flag instead.

2010-05-12 22:34:28

Just to be perfectly pedantic, not all modern hardware is pipelined. Embedded processors will have much more relevance to this sort of microoptimization.

Paul Nathan 2010-05-12 22:47:40

@Paul As I have some experience with Atmel AVRs I didn't forget to mention microcontrollers...

2010-05-12 22:55:16

Answer 2

+1 A:

No, that's not really true. One situation where it could be faster is when you would otherwise be calling a function to check the bounds during every iteration of a loop.

for(int i=myCollection.size(); i >= 0; i--)
{
   ...
}

But if it's less clear to do it that way, it's not worthwhile. In modern languages, you should use a foreach loop when possible, anyway. You specifically mention the case where you should use a foreach loop -- when you don't need the index.

Jonathon 2010-05-12 22:03:31

To be clear *and* efficient you should be in the habit of at least `for(int i=0, siz=myCollection.size(); i<siz; i++)`.

Software Monkey 2010-05-13 00:59:00

Answer 3

+3 A:

Count down faster in case like this:

for (i = someObject.getAllObjects.size(); i >= 0; i--) {…}

because someObject.getAllObjects.size() executes once at the beginning.

Sure, similar behaviour can be achieved by calling size() out of the loop, as Peter mentioned:

size = someObject.getAllObjects.size();
for (i = 0; i < size; i++) {…}

Bar 2010-05-12 22:04:17

It's not "definitely faster". In many cases that size() call could be hoisted out of the loop when counting up, so it would still only get called once. Obviously this is language and compiler dependent (and code dependent; eg. in C++ it won't get hoisted if size() is virtual), but it's far from definite either way.

Peter 2010-05-12 22:29:21

@Peter: Only if the compiler knows for certain that size() is idempotent across the loop. That's probably nearly always **not** the case, unless the loop is *very* simple.

Software Monkey 2010-05-13 01:01:09

Answer 4

+1 A:

regardless of the direction always use the prefix form (++i instead of i++)!

for (i=N; i>=0; --i)

or

for (i=0; i<N; ++i)

Explanation: http://www.eskimo.com/~scs/cclass/notes/sx7b.html

Furthermore you can write

for (i=N; i; --i)

But i would expect modern compilers to be able to do exactly these optimizations.

RSabet 2010-05-12 22:04:52

Never seen people complain about that before. But after reading the link it actually makes sense :) Thank you.

Tommy Jakobsen 2010-05-12 22:08:36

Um, why should he always use the prefix form? If there's no assignment going on, they are identical, and the article you linked to even says that postfix form is more common.

bobDevil 2010-05-12 22:09:36

Why should one always use the prefix form? In this instance, it's semantically identical.

quixoto 2010-05-12 22:09:40

The postfix form can potentially create an unnecessary copy of the object, although if the value is never being used, the compiler will probably optimize it to the prefix form anyway.

Nick Lewis 2010-05-12 22:12:28

Out of force of habit, I always do --i and i++ because when I learned C computers usually had a register predecrement and postincrement, but not vice versa. Thus, *p++ and *--p were faster than *++p and *p-- because the former two could be done in one 68000 machine code instruction.

JeremyP 2010-05-14 13:07:01

Answer 5

+5 A:

Yes..!!

Counting from N down to 0 is slightly faster that Counting from 0 to N in the sense of how hardware will handle comparison..

Note the comparison in each loop

i>=0
i<N

Most processors have comparison with zero instruction..so the first one will be translated to machine code as:

Load i
Compare and jump if Less than or Equal zero

But the second one needs to load N form Memory each time

load i
load N
Sub i and N
Compare and jump if Less than or Equal zero

So it is not because of counting down or up.. But because of how your code will be translated into machine code..

So counting from 10 to 100 is the same as counting form 100 to 10
But counting from i=100 to 0 is faster than from i=0 to 100 - in most cases
And counting from i=N to 0 is faster than from i=0 to N

Note that nowadays compilers may do this optimization for you (if it is smart enough)
Note also that pipeline can cause Belady's anomaly-like effect (can not be sure what will be better)
At last: please note that the 2 for loops you have presented are not equivalent.. the first prints one more * ....

Related: http://stackoverflow.com/questions/2884762/does-n-executes-faster-than-n1

Betamoo 2010-05-12 22:04:57

noted that the loops are not the same, forgot to put the "=" sign in one of them

Bob 2010-05-12 22:13:20

so what you're saying is it's not faster to count down, it's just faster to compare to zero than any other value. Meaning counting from 10 to 100 and counting down from a 100 to 10 would be the same?

Bob 2010-05-12 22:19:32

Yes.. it is not the matter of "counting down or up".. but it is the matter of "comparing to what"..

Betamoo 2010-05-12 22:21:38

so I guess the same holds for counting from -10 to 0 being "faster" (on some ancient system) then counting from 0 to -10, right?

nico 2010-05-13 05:34:36

While this is true the assembler level. Two things combine to meke untrue in reality -- modern hardware using long pipes and speculative instructions will sneak in the "Sub i and N" without incurring an extra cycle -- and -- even the crudest compiler will optimise the the "Sub i and N" out of existence.

James Anderson 2010-05-13 05:47:22

@nico: yes, it is..

Betamoo 2010-05-23 16:32:59

Answer 6

+13 A:

In the Intel x86 instruction set, building a loop to count down to zero can usually be done with fewer instructions than a loop that counts up to a non-zero exit condition. Specifically, the ECX register is traditionally used as a loop counter in x86 asm, and the Intel instruction set has a special jcxz jump instruction that tests the ECX register for zero and jumps based on the result of the test.

However, the performance difference will be negligible unless your loop is already very sensitive to clock cycle counts. Counting down to zero might shave 4 or 5 clock cycles off each iteration of the loop compared to counting up, so it's really more of a novelty than a useful technique.

Also, a good optimizing compiler these days should be able to convert your count up loop source code into count down to zero machine code (depending on how you use the loop index variable) so there really isn't any reason to write your loops in strange ways just to squeeze a cycle or two here and there.

dthorpe 2010-05-12 22:18:56

I've seen Microsoft's C++ compiler from a few years back make that optimization. It's able to see that the loop index isn't used, so it rearranges it to the fastest form.

Mark Ransom 2010-05-12 22:47:22

@Mark: The Delphi compiler as well, starting in 1996.

dthorpe 2010-05-12 22:59:43

Answer 7

+1 A:

On some older CPUs there are/were instructions like DJNZ == "decrement and jump if not zero". This allowed for efficient loops where you loaded an initial count value into a register and then you could effectively manage a decrementing loop with one instruction. We're talking 1980s ISAs here though - your teacher is seriously out of touch if he thinks this "rule of thumb" still applies with modern CPUs.

Paul R 2010-05-12 22:29:06

Answer 8

+96 A:

Is it really true? and if so does anyone know why?

In ancient days, when computers were still chipped out of fused silica by hand, when 8-bit microcontrollers roamed the Earth, and when your teacher was young (or your teacher's teacher was young), there was a common machine instruction called decrement and skip if zero (DSZ). Hotshot assembly programmers used this instruction to implement loops. Later machines got fancier instructions, but there were still quite a few processors on which it was cheaper to compare something with zero than to compare with anything else. (It's true even on some modern RISC machines, like PPC or SPARC, which reserve a whole register to be always zero.)

So, if you rig your loops to compare with zero instead of N, what might happen?

You might save a register
You might get a compare instruction with a smaller binary encoding
If a previous instruction happens to set a flag (likely only on x86 family machines), you might not even need an explicit compare instruction

Are these differences likely to result in any measurable improvement on real programs on a modern out-of-order processor? Highly unlikely. In fact, I'd be impressed if you could show a measurable improvement even on a microbenchmark.

Summary: I smack your teacher upside the head! You shouldn't be learning obsolete pseudo-facts about how to organize loops. You should be learning that the most important thing about loops is to be sure that they terminate, produce correct answers, and are easy to read. I wish your teacher would focus on the important stuff and not mythology.

Norman Ramsey 2010-05-12 22:36:40

++ And besides, the `putchar` takes many orders of magnitude longer than the loop overhead.

Mike Dunlavey 2010-05-12 22:39:58

+1 My CompSci teacher told me similar stuff. I tested it in Java and C on different platforms and I was hard to actually measure a difference.

Helper Method 2010-05-12 22:44:49

It's not strictly mythology: if he is doing some sort of uber-optimized real-time system, it would come in handy. But that sort of hacker would probably know all this already and certainly wouldn't be confusing entry-level CS students with arcana.

Paul Nathan 2010-05-12 22:50:35

+1 Also, if this were profitable, it's likely the compiler will optimize it anyway, so the programmer is free to write the loop for readability.

Jay Conrod 2010-05-13 00:53:31

@Jay, sorry, it wouldn't. The optimization tends to change program behavior detectably and the compiler can't perform the proof of correctness that this optimization is safe.

Joshua 2010-05-13 01:38:21

@Joshua: In what way would this optimization be detectable? As the questioner said, the loop index isn't used in the loop itself, so provided the number of iterations is the same there is no change in behavior. In terms of a proof of correctness, making the variable substitution `j=N-i` shows that the two loops are equivalent.

psmears 2010-05-13 10:03:27

+1: we do work on embedded systems with crappy 8-bit microprocessors and we wouldn't even bother with this kind of thing in 99.999% of the code we write.

GrahamS 2010-05-13 10:10:33

@psmears most of the time where this optimization is useful it's detectable. You're right in this case it's not detectable.

Joshua 2010-05-13 15:10:33

+1 for the Summary. Don't sweat it because on modern hardware it makes virtually no difference. It made virtually no difference 20 years ago either. If you think you have to care, time it both ways, see no clear difference, and **go back to writing the code clearly and correctly**.

Donal Fellows 2010-05-15 18:04:15

Answer 9

+2 A:

Is it faster to count down than up?

Maybe. But far more than 99% of the time it won't matter, so you should use the most 'sensible' test for terminating the loop, and by sensible, I mean that it takes the least amount of thought by a reader to figure out what the loop is doing (including what makes it stop). Make your code match the mental (or documented) model of what the code is doing.

If the loop is working it's way up through an array (or list, or whatever), an incrementing counter will often match up better with how the reader might be thinking of what the loop is doing - code your loop this way.

But if you're working through a container that has N items, and are removing the items as you go, it might make more cognitive sense to work the counter down.

A bit more detail on the 'maybe' in the answer:

It's true that on most architectures, testing for a calculation resulting in zero (or going from zero to negative) requires no explicit test instruction - the result can be checked directly. If you want to test whether a calculation results in some other number, the instruction stream will generally have to have an explicit instruction to test for that value. However, especially with modern CPUs, this test will usually add less than noise-level additional time to a looping construct. Particularly if that loop is performing I/O.

On the other hand, if you count down from zero, and use the counter as an array index, for example, you might find the code working against the memory architecture of the system - memory reads will often cause a cache to 'look ahead' several memory locations past the current one in anticipation of a sequential read. If you're working backwards through memory, the caching system might not anticipate reads of a memory location at a lower memory address. In this case, it's possible that looping 'backwards' might hurt performance. However, I'd still probably code the loop this way (as long as performance didn't become an issue) because correctness is paramount, and making the code match a model is a great way to help ensure correctness. Incorrect code is as unoptimized as you can get.

So I would tend to forget the professor's advice (of course, not on his test though - you should still be pragmatic as far as the classroom goes), unless and until the performance of the code really mattered.

Michael Burr 2010-05-12 22:39:56

Answer 10

A:

Bob,

Not until you are doing microoptimizations, at which point you will have the manual for your CPU to hand. Further, if you were doing that sort of thing, you probably wouldn't be needing to ask this question anyway. :-) But, your teacher evidently doesn't subscribe to that idea....

There are 4 things to consider in your loop example:

for (i=N; 
 i>=0;             //thing 1
 i--)             //thing 2
{
  putchar('*');   //thing 3
}

Comparison

Comparison is (as others have indicated) relevant to particular processor architectures. There are more types of processors than those that run Windows. In particular, there might be an instruction that simplifies and speeds up comparisons with 0.

Adjustment

In some cases, it is faster to adjust up or down. Typically a good compiler will figure it out and redo the loop if it can. Not all compilers are good though.

Loop Body

You are accessing a syscall with putchar. That is massively slow. Plus, you are rendering onto the screen (indirectly). That is even slower. Think 1000:1 ratio or more. In this situation, the loop body totally and utterly outweighs the cost of the loop adjustment/comparison.

Caches

A cache and memory layout can have a large effect on performance. In this situation, it doesn't matter. However, if you were accessing an array and needed optimal performance, it would behoove you to investigate how your compiler and your processor laid out memory accessses and to tune your software to make the most of that. The stock example is the one given in relation to matrix multiplication.

Paul Nathan 2010-05-12 23:00:14

Answer 11

+1 A:

In C to psudo-assembly:

for (i = 0; i < 10; i++) {
    foo(i);
}

turns into

    clear i
top_of_loop:
    call foo
    increment i
    compare 10, i
    jump_less top_of_loop

while:

for (i = 10; i <= 0; i--) {
    foo(i);
}

turns into

    load i, 10
top_of_loop:
    call foo
    decrement i
    jump_not_neg top_of_loop

Note the lack of the compare in the second psudo-assembly. On many architectures there are flags that are set by arithmatic operations (add, subtract, multiply, divide, increment, decrement) which you can use for jumps. These often give you what is essentially a comparison of the result of the operation with 0 for free. In fact on many architectures

x = x - 0

is semantically the same as

compare x, 0

Also, the compare against a 10 in my example could result in worse code. 10 may have to live in a register, so if they are in short supply that costs and may result in extra code to move things around or reload the 10 every time through the loop.

Compilers can sometimes rearrange the code to take advantage of this, but it is often difficult because they are often unable to be sure that reversing the direction through the loop is semantically equivalent.

nategoose 2010-05-12 23:12:09

Answer 12

A:

Now, I think you had enough assembly lectures:) I would like to present you another reason for top->down approach.

The reason to go from the top is very simple. In the body of the loop, you might accidentally change the boundary, which might end in incorrect behaviour or even non-terminating loop.

Look at this small portion of Java code (the language does not matter I guess for this reason):

    System.out.println("top->down");
    int n = 999;
    for (int i = n; i >= 0; i--) {
        n++;
        System.out.println("i = " + i + "\t n = " + n);
    }
    System.out.println("bottom->up");
    n = 1;
    for (int i = 0; i < n; i++) {
        n++;
        System.out.println("i = " + i + "\t n = " + n);
    }

So my point is you should consider prefering going from the top down or having a constant as a boundary.

Gabriel Ščerbák 2010-05-13 00:20:00

Huh?!! You failing example is really counter-intuitive, which is to say, a straw-man argument - no one would ever write this. One would write `for (int i=0; i < 999; i++) {`.

Software Monkey 2010-05-13 01:12:16

@Software Monkey imagine n being a result of some computation... e.g. you might want to iterate over some collection and its size is the boundary, but as some side effect, you add new elements to the collection in the loop body.

Gabriel Ščerbák 2010-05-13 08:08:56

@Gabriel: If that's what you intended to communicate, then that's what your example should illustrate: `for(int xa=0; xa<collection.size(); xa++) { collection.add(SomeObject); ... }`

Software Monkey 2010-05-13 18:20:41

@Software Monkey I wanted to be more general than just talk particularly about collections, because what I am reasoning about has nothing to do with collections

Gabriel Ščerbák 2010-05-13 18:42:09

@Gabriel: Yes, but if you are going to reason by example, your examples need to be credible and illustrative of the point.

Software Monkey 2010-05-14 03:52:03

Answer 13

A:

The point is that when counting down you don't need to check i >= 0 separately to decrementing i. Observe:

for (i = 5; i--;) {
  alert(i);  // alert boxes showing 4, 3, 2, 1, 0
}

Both the comparison and decrementing i can be done in the one expression.

See other answers for why this boils down to fewer x86 instructions.

As to whether it makes a meaningful difference in your application, well I guess that depends on how many loops you have and how deeply nested they are. But to me, it's just as readable to do it this way, so I do it anyway.

thomasrutter 2010-05-13 01:42:03

Answer 14

A:

It is an interesting question, but as a practical matter I don't think it's important and does not make one loop any better than the other.

According to this wikipedia page: Leap second, "...the solar day becomes 1.7 ms longer every century due mainly to tidal friction." But if you are counting days until your birthday, do you really care about this tiny difference in time?

It's more important that the source code is easy to read and understand. Those two loops are a good example of why readability is important -- they don't loop the same number of times.

I would bet that most programmers read (i = 0; i < N; i++) and understand immediately that this loops N times. A loop of (i = 1; i <= N; i++), for me anyway, is a little less clear, and with (i = N; i > 0; i--) I have to think about it for a moment. It's best if the intent of the code goes directly into the brain without any thinking required.

Jim Flood 2010-05-13 01:44:22

Answer 15

+1 A:

Strangely, it appears that there IS a difference. At least, in PHP. Consider following benchmark:

<?php

print "<br>".PHP_VERSION;
$iter = 100000000;
$i=$t1=$t2=0;

$t1 = microtime(true);
for($i=0;$i<$iter;$i++){}
$t2 = microtime(true);
print '<br>$i++ : '.($t2-$t1);

$t1 = microtime(true);
for($i=$iter;$i>0;$i--){}
$t2 = microtime(true);
print '<br>$i-- : '.($t2-$t1);

$t1 = microtime(true);
for($i=0;$i<$iter;++$i){}
$t2 = microtime(true);
print '<br>++$i : '.($t2-$t1);

$t1 = microtime(true);
for($i=$iter;$i>0;--$i){}
$t2 = microtime(true);
print '<br>--$i : '.($t2-$t1);

Results are interesting:

PHP 5.2.13
$i++ : 8.8842368125916
$i-- : 8.1797409057617
++$i : 8.0271911621094
--$i : 7.1027431488037


PHP 5.3.1
$i++ : 8.9625310897827
$i-- : 8.5790238380432
++$i : 5.9647901058197
--$i : 5.4021768569946

If someone knows why, it would be nice to know :)

EDIT: Results are the same even if you start counting not from 0, but other arbitrary value. So there is probably not only comparison to zero which makes a difference?

ts 2010-05-13 02:06:59

The reason it's slower is that the prefix operator doesn't need to store a temporary. Consider $foo = $i++; Three things happen: $i is stored to a temporary, $i is incremented, and then $foo is assigned that temporary's value. In the case of $i++; a smart compiler could realize the temporary is unnecessary. PHP just doesn't. C++ and Java compilers are smart enough to make this simple optimization.

Conspicuous Compiler 2010-05-13 08:32:53

and why $i-- is faster than $i++ ?

ts 2010-05-13 10:03:39

How many iterations of your benchmark did you run? Did you clip outriders and take an average for each result? Was your computer doing anything else during the benchmarks? That ~0.5 difference could just be the result of other CPU activity, or pipeline utilisation, or... or... well, you get the idea.

Jonners 2010-05-13 12:02:59

Yes, here i am giving averages. Benchmark was runned on different machines, and difference is accidentally.

ts 2010-05-13 13:44:42

@Conspicuous Compiler => you know or you suppose?

ts 2010-05-13 14:17:06

ansaurus

tags:

views:

answers:

Is it faster to count down than it is to count up?

related questions