tags:

views:

858

answers:

12

I believe (from some research reading) that counting down in for-loops is actually more efficient and faster in runtime. My full software code is C++

I currently have this:

for (i=0; i<domain; ++i) {

my 'i' is unsigned resgister int, also 'domain' is unsigned int

in the for-loop i is used for going through an array, e.g.

array[i] = do stuff

converting this to count down messes up the expected/correct output of my routine.

I can imagine the answer being quite trivial, but I can't get my head round it.

UPDATE: 'do stuff' does not depend on previous or later iteration. The calculations within the for-loop are independant for that iteration of i. (I hope that makes sense).

UPDATE: To achieve a runtime speedup with my for-loop, do I count down and if so remove the unsigned part when delcaring my int, or what other method?

Please help.

A: 
for (i=domain - 1; i >= 0; i--)
Nuoji
unfortunately, with i unsigned, it will ALWAYS be >= 0 ...!
Alex Martelli
This might explain why counting down messes the result... ;)
Daniel Brückner
Very true, didn't see the 'unsigned' in his description there. Nicely spotted.
Nuoji
Easily fixed, i = domain; i != 0; i-- and then use i-1 within the loop.
Michael
+3  A: 

If you have a decent compiler, it will optimize "counting up" just as effectively as "counting down". Just try a few benchmarks and you'll see.

Alex Martelli
A lot of times the compiler can't make this optimization, if it can't determine that counting down will not change the observable behavior of your code.
Michael
Ah, this is interesting
ohit
Yes, but this optimization is 100% not necessary for 99.99999% of programs. And seeing that the guy asking is not experienced enough to convert a simple for loop to its reverse, It would be a gross understatement to say that loop optimization should be the less of his worries.
foljs
True, I wasn't trying to imply that you should go ahead and do it since the compiler can't.
Michael
+10  A: 

Checking to see if a number is zero can be quicker or more efficient than a comparison. But this is the sort of micro-optimization you really shouldn't worry about - a few clock cycles will be greatly dwarfed by just about any other perf issue.

On x86:

dec eax
jnz Foo

Instead of:

inc eax
cmp eax, 15
jl Foo
Michael
+1  A: 

Hard to say with information given but... reverse your array, and count down?

patjbs
+7  A: 

I'm guessing your backward for loop looks like this:

for (i = domain - 1; i >= 0; --i) {

In that case, because i is unsigned, it will always be greater than or equal to zero. When you decrement an unsigned variable that is equal to zero, it will wrap around to a very large number. The solution is either to make i signed, or change the condition in the for loop like this:

for (i = domain - 1; i >= 0 && i < domain; --i) {

Or count from domain to 1 rather than from domain - 1 to 0:

for (i = domain; i >= 1; --i) {
    array[i - 1] = ...; // notice you have to subtract 1 from i inside the loop now
}
yjerem
+1 for catching the overflow. It would be wise to add that this type of (premature) micro optimization is the "root of all evil" - Donald Knuth
lothar
Drew Hall
When Jeremy uses the unsigned value, he has a redundant condition, does he not? You only need to check i < domain, NOT i >= 0. As stated, i is always >= 0. Checking for i < domain will fail once it wraps around. Correct?
Tom
That is correct, but in real code I would check for both conditions just in case the counter variable was changed to a signed variable which would break the loop if it only checked the one condition. That's why I left it in... I wasn't thinking about micro optimization.
yjerem
+11  A: 

This is not an answer to your problem, because you don't seem to have a problem.

This kind of optimization is completely irrelevant and should be left to the compiler (if done at all).

Have you profiled your program to check that your for-loop is a bottleneck? If not, then you do not need to spend time worrying about this. Even more so, having "i" as a "register" int, as you write, makes no real sense from a performance standpoint.

Even without knowing your problem domain, I can guarantee you that both the reverse-looping technique and the "register" int counter will have negligible impact on your program's performance. Remember, "Premature optimization is the root of all evil".

That said, better spent optimization time would be on thinking about the overall program structure, data structures and algorithms used, resource utilization, etc.

foljs
Thank you. Why does 'register' int make no sense?
ohit
I believe most modern compilers ignore the "register" hint anyway.
Michael
If compiler completely respected the "register" hint, it would probably result in a perf degradation. On x86, there's only a handful of registers. By demanding one register be used for your loop counter, you lower the number of registers the compiler gets to use when compiling your code.
Michael
There are several issues. First of all, it is only a suggestion to the compiler, one that can be ignored. In general we should not make assumptions about the machine and the register use because we dont have full control about the architectures under which our programs will execute, and more cpus have complex execution paths and optimization algorithms already in place. The other reason is that the impact of this will be negligible (perhaps hundreds of thousands of times dwarfed) by other bottlenecks in a program, such as memory allocation, IO, synchronization, bad choice of algorithms, etc.
foljs
+2  A: 

So you "read" that couting down is more efficient? I find this very difficult to believe unless you show me some profiler results and the code. I can buy it under some circumstances, but in the general case, no. Seems to me like this is a classic case of premature optimization.

Your comment about "register int i" is also very telling. Nowadays, the compiler always knows better than you how to allocate registers. Don't bother using using the register keyword unless you have profiled your code.

Brian Neal
+3  A: 

When you're looping through data structures of any sort, cache misses have a far bigger impact than the direction you're going. Concern yourself with the bigger picture of memory layout and algorithm structure instead of trivial micro-optimisations.

Andrew
Well, compared to algorithm structure memory layout is also a micro-optimization, isn't it?
foljs
@foljs: No--memory layout can have a huge (1000x or more) effect because the memory hierarchy has such a huge disparity in access speeds. All of the fast linear algebra libraries (LAPACK, etc) use block structuring for this reason.
Drew Hall
@drewhall is right, @foljs. cache misses -- especially with the latest intel chips -- are incredibly expensive. my understanding is that it's easy to burn up to 30% of your potential CPU throughout with poor memory layout.
Andrew
+10  A: 

There is only one correct method of looping backwards using an unsigned counter:

for( i = n; i-- > 0; )
{
    // Use i as normal here
}

There's a trick here, for the last loop iteration you will have i = 1 at the top of the loop, i-- > 0 passes because 1 > 0, then i = 0 in the loop body. On the next iteration i-- > 0 fails because i == 0, so it doesn't matter that the postfix decrement rolled over the counter.

Very non obvious I know.

Don Neufeld
Didn't know this one. Nice!
mafutrct
But CAREFUL: then the temp copy needed for i-- dwarfs any performance that might be gained from loop-down. I.e. don't use this as a performance optimization, use it when you actually need to count an unsigned down to 0 algorithmically.
MadKeithV
+3  A: 

It has nothing to do with counting up or down. What can be faster is counting toward zero. Michael's answer shows why — x86 gives you a comparison with zero as an implicit side effect of many instructions, so after you adjust your counter, you just branch based on the result instead of doing an explicit comparison. (Maybe other architectures do that, too; I don't know.)

Borland's Pascal compilers are notorious for performing that optimization. The compiler transforms this code:

for i := x to y do
  foo(i);

into an internal representation more akin to this:

tmp := Succ(y - x);
i := x;
while tmp > 0 do begin
  foo(i);
  Inc(i);
  Dec(tmp);
end;

(I say notorious not because the optimization affects the outcome of the loop, but because the debugger displays the counter variable incorrectly. When the programmer inspects i, the debugger may display the value of tmp instead, causing no end of confusion and panic for programmers who think their loops are running backward.)

The idea is that even with the extra Inc or Dec instruction, it's still a net win, in terms of running time, over doing an explicit comparison. Whether you can actually notice that difference is up for debate.

But note that the conversion is something the compiler would do automatically, based on whether it deemed the transformation worthwhile. The compiler is usually better at optimizing code than you are, so don't spend too much effort competing with it.

Anyway, you asked about C++, not Pascal. C++ "for" loops aren't quite as easy to apply that optimization to as Pascal "for" loops are because the bounds of Pascal's loops are always fully calculated before the loop runs, whereas C++ loops sometimes depend on the stopping condition and the loop contents. C++ compilers need to do some amount of static analysis to determine whether any given loop could fit the requirements for the kind of transformation Pascal loops qualify for unconditionally. If the C++ compiler does the analysis, then it could do a similar transformation.

There's nothing stopping you from writing your loops that way on your own:

for (unsigned i = 0, tmp = domain; tmp > 0; ++i, --tmp)
  array[i] = do stuff

Doing that might make your code run faster. Like I said before, though, you probably won't notice. The bigger cost you pay by manually arranging your loops like that is that your code no longer follows established idioms. Your loop is a perfectly ordinary "for" loop, but it no longer looks like one — it has two variables, they're counting in opposite directions, and one of them isn't even used in the loop body — so anyone reading your code (including you, a week, a month, or a year from now when you've forgotten the "optimization" you were hoping to achieve) will need to spend extra effort proving to himself or herself that the loop is indeed an ordinary loop in disguise.

(Did you notice that my code above used unsigned variables with no danger of wrapping around at zero? Using two separate variables allows that.)

Three things to take away from all this:

  1. Let the optimizer do its job; on the whole it's better at it than you are.
  2. Make ordinary code look ordinary so that the special code doesn't have to compete to get attention from people reviewing, debugging, or maintaining it.
  3. Don't do anything fancy in the name of performance until testing and profiling show it to be necessary.
Rob Kennedy
A: 

Jeremy Ruten rightly pointed out that using an unsigned loop counter is dangerous. It's also unnecessary, as far as I can tell.

Others have also pointed out the dangers of premature optimization. They're absolutely right.

With that said, here is a style I used when programming embedded systems many years ago, when every byte and every cycle did count for something. These forms were useful for me on the particular CPUs and compilers that I was using, but your mileage may vary.

// Start out pointing to the last elem in array
pointer_to_array_elem_type p = array + (domain - 1);
for (int i = domain - 1; --i >= 0 ; ) {
     *p-- = (... whatever ...)
}

This form takes advantage of the condition flag that is set on some processors after arithmetical operations -- on some architectures, the decrement and testing for the branch condition can be combined into a single instruction. Note that using predecrement (--i) is the key here -- using postdecrement (i--) would not have worked as well.

Alternatively,

// Start out pointing *beyond* the last elem in array
pointer_to_array_elem_type p = array + domain;
for (pointer_to_array_type p = array + domain; p - domain > 0 ; ) {
     *(--p) = (... whatever ...)
}

This second form takes advantage of pointer (address) arithmetic. I rarely see the form (pointer - int) these days (for good reason), but the language guarantees that when you subtract an int from a pointer, the pointer is decremented by (int * sizeof (*pointer)).

I'll emphasize again that whether these forms are a win for you depends on the CPU and compiler that you're using. They served me well on Motorola 6809 and 68000 architectures.

Dan Breslau
A: 

In some later arm cores, decrement and compare takes only a single instruction. This makes decrementing loops more efficient than incrementing ones.

I don't know why there isn't an increment-compare instruction also.

I'm surprised that this post was voted -1 when it's a true issue.

piotr
But the compiler will take advantage of it if it is available. I've seen compilers generate code that was using an increment in C++ do a decrement in assembler to take advantage of that.
Brian Neal
So there is no need to jump through hoops in your C++ code by changing a standard incrementing for loop to a non-intuitive loop when you have a good optimizing compiler.
Brian Neal