views:

916

answers:

13

Bit of a weird one: I was told a while ago by a friend that rearranging this example for loop from :

for(int i = 0; i < constant; ++i) {
    // code...
}

to:

for(int i = 0; constant > i; ++i) {
    // code...
}

would slightly increase performance in C++. I don't see how comparing a constant value to a variable is faster than vice-versa, and some rudimentary tests I ran didn't show any difference in speed between the two implementations. The same was also true of testing this Python while loop:

while i < constant:
    # code...
    i += 1

vs:

while constant > i:
    # code...
    i += 1

Am I wrong? Are my simple tests not enough to determine the speed variation? Is this true of other languages? Or is this just a new best practice?

+3  A: 

It is more of a best practice not to go out of your way for optimization tweaks like this which will give you negligible benefit (assuming it is a tweak).

TheTXI
+17  A: 

Probably not, but if it did, the compiler would probably make the optimization for you automatically anyways. So do it whatever way makes your code most readable.

Eric Petroelje
Absolutely, any modern compiler should handle micro-optimizations at this level automatically.
MadKeithV
+11  A: 

My suspicion is your friend is 100% wrong. But I wouldn't trust my opinion anymore than I would trust your friend. In fact, if there is a performance problem there is only one person you should trust.

The Profiler

This is only way you can ever claim with any authority that one way is or is not faster than another.

JaredPar
And, of course, a profiler's results are only applicable to the compiler and environment tested.
Eddie
anthropomorphizing the profiler again, are we?
rmeador
@rmeador, I actually had to look that word up :). But yes, it appears i did
JaredPar
Anthropomorphizers unite! The only thing we have to lose is Mr. Dignity.
chaos
Mrs. Amusement was happy to see Mr. Dignity's fall -- again!
S.Lott
+43  A: 

It's more in the line of C++ folklore, hand micro-optimizations that worked once on a particular version of a particular compiler and get passed down ever after as some kind of lore distinguishing the possessor from the common herd. It's rubbish. Profiling is truth.

chaos
Profiling is truth. Can we get that emblazoned somewhere on the site? Possibly as the rollover for the SO logo
Thanks, I hoped as much!
Daz
I need a "Profiling Is Truth!" poster.
mch
Except that profiling is not truth, but only a close approximation thereof.
Paul Biggar
@mch, my exact thoughts. This post makes for an epic poster.
GMan
Someone needs to bury this post. My reputation as an system optimizer depends on it. Job security through (obscurity|naivety). :o
280Z28
A: 

This is absolutely a case of micro-optimization and really doesn't need to be done.

It is true that (especially) in C++ there is a small performance difference between a post-increment operation and a pre-increment operation but that difference in today's compilers is generally negligible. The reason for changing the order of the conditional is due to the change from post- to pre-increment.

Scott Dorman
>> "and really doesn't need to (be) done" - most of the time. Performance still matters, only ,ost of the time, micro optimizations don't.
peterchen
Wrong: there's a significant difference between pre- and post-increment in C++, because post-increment must return a temporary that may in some cases be optimized away if it is not used.The change in the conditional is entirely unrelated to this.
MadKeithV
There is no indication in the example of a change from post to pre increment - both the same as pre increment.
Greg Domjan
+5  A: 

Assuming short-circuit evaluation, the only time this should make much of a difference is if you have a call to a slow function in your loop. For example, if you had a function that queried a value from your database, and returned it, then this:

while(bContinue && QueryStatusFromDatabase==1){
}  //while

Would be much faster than:

while(QueryStatusFromDatabase==1 && bContinue){
}  //while

Even though they are logically identical.

That's because the first one can stop as soon as a simple boolean is FALSE - the query only has to run when the boolean is TRUE, but the second one will always run the query.

Unless you have a need to squeeze every possible CPU cycle out of your loop, then those extreme cases are probably the only ones worth spending your time on. Think of it this way: To make up the time you spent asking this question would probably take several billion iterations of your loop.

Worst of all is when you have a function as a condition, and that function has side-effects that are secretly expected by some other place in the code. So when you make your little optimization, the side effects only happen some of the time, and your code breaks in weird ways. But that's a bit of a tangent. The short answer to your question is "Sometimes, but it usually doesn't matter."

JosephStyons
Actually, "much faster than" depends on how many times the test is true. If very often, the first time you enter the loop, bContinue is false, then yes it will be much faster. If you typically loop hundreds of times, then one more DB query may not matter. This is why you must profile.
Eddie
A: 

Today, on a good compiler, not at all.

First, the operand order doesn't make any difference on the instruction sets I've seen. Second, if there was one, any decent optimizer would pick the better one.

We should not blindly dismiss performance, though. Responsiveness still matters, as does calculation time. Especially when writing library code you don't know when you will be called two million times in a row.

Also, not all platforms are created equal. Embedded platforms often suffer from sub-standard optimizers on top of low(er) processing power and real time processing requirements.

On Desktop/Server platforms, the weight has shifted towards well-encapsulated complexity that implements better scaling algorithms.

Microoptimizations are bad only when they hurt something else, like readability, complexity or maintainability. When everything else is equal, why not pick the faster?


There was a time when ending the loop at zero (e.g. by counting down) on x86 actually could give notable improvements to tight loops, as a DEC CX/JCXNZ was faster (it still potentially could be, as it could save a register / memory access for the comparand; compiler execution optimizations are usually beyond that now). What your friend heard might be a mangled version of that.

peterchen
I guess that'll teach me for not clicking the 'load new comments' banner when it popped up...
Kylotan
+8  A: 

The examples you gave should have absolutely no performance difference in C++, and I doubt they would differ in Python either.

Perhaps you're confusing it with a different optimisation:

for (int i = 0; i < variable; ++i)

// ...vs...

for (int i = variable; i ; --i)

The latter is faster in some architectures because the act of decrementing the variable will set the zero flag, which can then be checked in a jump-if-not-zero instruction, giving you the loop iteration and the conditional in one go. The former example needs to perform an explicit comparison or a subtraction to set a flag, and then jump based on that.

However, most of the time the compiler can optimise the first case into the second (especially if it sees that the variable is effectively a constant), and on some compiler/architecture combinations instructions may be generated that make the first method more like the second. Things like this are only worth trying if you have a tight inner loop that your profiler is telling you is expensive, but you'll never notice the difference otherwise, if there even is one.

Kylotan
I wasn't confusing it with a different optimisation, but thanks for the example and explanation
Daz
+4  A: 

While profiling is best, it isn't the only way.

You could compare the assembly each option creates which should not be out of the question for micro optimizations like this. A little research on your hardware platform's commands could give you a decent idea if this change makes a difference at all and how it may perform differently. I assume you'll be counting the number of moves and compare commands for your example.

If your debugger lets you switch between source and a disassembled view while stepping this should be pretty easy.

Aardvark
Why did I not think of this (bangs head on desk)?! Assembly for my test code is identical on Cygwin, but I'll repeat on other platforms as and when I can. Thanks!
Daz
Maybe, but looking at assembly skips weird affects involving caches, alignment, etc.
Paul Biggar
@Paul: If nothing else, you can at least check if the assembly is identical. If so, there will be no speed difference.
Brian
A: 

I humbly suggest that on some compilers on certain architectures the following could reduce more effectively than variants:

i = constant - 1
while (--i) {
}

To get constant iterations.

As many of the comments have suggested, a compiler will do a good job of optimizing a loop for you (compiler optimization people have spent lots and lots of time thinking about it). Legible code is probably more valuable, but YMMV!

If you really want to optimize beyond what you think a compiler may be able to do, I suggest looking at the assembly that the high level language generates, and consider further optimizations from there.

On a high level, you may also be able to get significantly greater performance by using OpenMP or on a lower level by way of a vector instruction set (e.g. MMX) to do multiple computations in a single instruction. That's a bit beyond the scope of the question, and you'd have to give a lot more information about what the loop's doing for useful advice on that.

Hope that helps & cheers.

Brian M. Hunt
+2  A: 

Any sane compiler will implement both the same way. If one is faster than another on some architecture, the compiler will optimize it that way.

Zifre
A: 

The supplied optimization would only optimize more for a given compiler(maybe). Abstractly, it should generate the same code.

If you are doing micro-optimizations - presuming the requirements to do micro-optimizations are met - your first step should be to look at the assembly generated, and then the assembly manuals for your architecture.

For instance, i++ may be faster than i+1. Depends. In naive CPUs, equality to 0 is much faster than less-than. If your compiler/CPU don't support instruction reordering, you may find that interspersing assignments with computations speeds your code up. (certain computations can cause pipeline stalls) But that's something you will have to specifically determine for your compiler/architecture combination.

Frankly, I wouldn't bother doing this level of optimization unless I absolutely needed every last cycle from my processor. Traditionally, graphics or scientific computation is where you need this kind of stuff[*].

*I know of a program that, after months of optimizations and on modern machines, still would take many months to process the data. Runtimes for a single datumset are in the week range. There are quite a few data to use....

Paul Nathan
A: 

Comparing against 0 is very fast, so this would actually be slightly faster:

for (int i = constant; i > 0; --i)
{ 
  //yo
}

I think it is better to use != in any case, as it makes off by one errors easier to detect and is the only way to use iterators with non-contiguous data structures, like linked lists.

Hooked