views:

428

answers:

7

Consider an example like this:

if (flag)
  for (condition)
    do_something();
else
  for (condition)
    do_something_else();

If flag doesn't change inside the for loops, this should be semantically equivalent to:

for (condition)
  if (flag)
    do_something();
  else
    do_something_else();

Only in the first case, the code might be much longer (e.g. if several for loops are used or if do_something() is a code block that is mostly identical to do_something_else()), while in the second case, the flag gets checked many times.

I'm curious whether current C++ compilers (most importantly, g++) would be able to optimize the second example to get rid of the repeated tests inside the for loop. If so, under what conditions is this possible?

+8  A: 

Yes, if it is determined that flag doesn't change and can't be changed by do_something or do_something_else, it can be pulled outside the loop. I've heard of this called loop hoisting, but Wikipedia has an entry called "loop invariant code motion".

If flags is a local variable, the compiler should be able to do this optimization since it's guaranteed to have no effect on the behavior of the generated code.

If flags is a global variable, and you call functions inside your loop it might not perform the optimization - it may not be able to determine if those functions modify the global.

This can also be affected by the sort of optimization you do - optimizing for size would favor the non-hoisted version while optimizing for speed would probably favor the hoisted version.

In general, this isn't the sort of thing that you should worry about, unless profiling tells you that the function is a hotspot and you see that less than efficient code is actually being generated by going over the assembly the compiler outputs. Micro-optimizations like this you should always just leave to the compiler unless you absolutely have to.

Michael
Thanks for the answer. From your wikipedia link I've found the page for "loop unswitching", which seems to be an even more precise term for this kind of optimization.
generic-identity
Although the terms aren't really cut and dried, loop-hoisting and loop-invariant-code-motion are not really used to describe this optimization. They're more for single instructions.
Paul Biggar
+1  A: 

I'm sure if the compiler can determine that the flag will remain constant, it can do some shufflling:

const bool flag = /* ... */;
for (..;..;..;)
{
    if (flag)
    {
        // ...
    }
    else
    {
        // ...
    }
}

If the flag is not const, the compiler cannot necessarily optimize the loop, because it can't be sure flag won't change. It can if it does static analysis, but not all compilers do, I think. const is the sure-fire way of telling the compiler the flag won't change, after that it's up to the compiler.

As usual, profile and find out if it's really a problem.

GMan
'const' is a compiler-checked condition, but has no effect on optimization.
peterchen
The scope of the variable is probably more important but the constness can have a bearing on optimization. It is undefined behaviour to change an object that is `const` (this is different from using a `const_cast` to change an object that isn't a `const` obejct, but the reference through which the object is known is a `const` reference) so a compiler *can* use this information to cache its value.
Charles Bailey
peter, `const` has everything to do with optimization.
GMan
Probably more important is the object's scope. If flag is a local object and no references or pointers to it are passed to external functions then the compiler can be very free with its actual implementation, even going so far as to eliminate any actual storage for the object. As there is no way for conforming code to know that the compiler has done this - the code works 'as if' the object exists - the optimization is legal.
Charles Bailey
I don't for this particular case const-ness would generate different code since the compiler can clearly see that flag is read once and never written to.
Michael
@peter, this isn't quite true. A conformant C++ implementation is at liberty to assume that an object that is inherently `const` will never change, since the Standard defines any attempt to do so (e.g. via `const_cast`) as U.B. So it can certainly take into account `const` when determining whether to optimize.
Pavel Minaev
Pavel Minaev
@Pavel: IIRC a discussion (on GotW maybe?) modifying a const value through const_cast is U.B. only because cnstant values can be placed in an read-only segment. In akll other cases, the immutability evaluation would end up the same without affecting results. But I can't dig it up anymore, so that's all speculation.
peterchen
@peterchen, I think you have cause and effect confused. A compiler can place a `const` object in RO memory precisely *because* it is undefined behaviour to modify a `const` object. Consequently, it can also make the same optimization decisions about a `const` object whether or not it actually does place it in RO memory.
Charles Bailey
peterchen
found this - http://www.parashift.com/c++-faq-lite/const-correctness.html#faq-18.14probably the origin of my memory, though I think it was more elaborate.
peterchen
I'm with Peter on this - const has no effect on optimization. In this case, the compiler can optimize it because it's local. however, if the const object was a reference (via a real reference or by dereferencing a pointer-to-const) and the compiler couldn't determine that it was aliased (assuming there's a function call inside the for loop that allows the aliasing to have an effect)there would be no additional optimization opportunities due to the const qualifier. It's the fact that the object is local that enables the optimization - not that there's a const qualifier on it
Michael Burr
If we're taking sides on this, I'm with GMan and Pavel. `:)` Look at this: `{const int k=5; for(int i=0; i<k; ++i) g(}` Since `k` is `const` and since modifying `k` inside `g()` would be U.B., the compiler could apply certain optimizations it couldn't apply if it had to assume `k` could change.
sbi
@sbi: I'll modify my position on this - the const qualifier is far less useful as an optimization tool than is often thought - particularly for non-POD objects. See Sutter's GotW article for some additional details: http://www.gotw.ca/gotw/081.htm
Michael Burr
A: 

It's called a loop invariant and the optimization is called loop invariant code motion and also code hoisting. The fact that it's in a conditional will definitely make the code analysis more complex and the compiler may or may not invert the loop and the conditional depending on how clever the optimizer is.

There is a general answer for any specific case of this kind of question, and that's to compile your program and look at the generated code.

DigitalRoss
A: 

I would be wary to say that it will. Can it guarantee that the value won't be modified by this, or another thread?

That said, the second version of the code is generally more readable and it would probably be the last thing to optimize in a block of code.

patros
@patros - If it's a local variable, multithreaded doesn't have to come into play. Even if the variable is visible to other threads, the compiler could still perform the optimization - unless you mark as volatile the compiler is not required to refetch at every access and the codegen is valid.
Michael
@Michael - Agreed, but we don't see the definition of flag in this snippet, so it may not be local. I also agree that the compiler can legally optimize this, just not that it always will. If it will do it is probably a function of variable scope and compiler flags.
patros
+1  A: 

Generally, yes. But there is no guarantee, and the places where the compiler will do it are probably rare.

What most compilers do without a problem is hoisting immutable evaluations out of the loop, e.g. if your condition is

if (a<b) ....

when a and b are not affected by the loop, the comparison will be made once before the loop.

This means if the compiler can determine the condition does not change, the test is cheap and the jump wenll predicted. This in turn means the test itself costs one cycle or no cycle at all (really).

In which cases splitting the loop would be beneficial?

a) a very tight loop where the 1 cycle is a significant cost
b) the entire loop with both parts does not fit the code cache

Now, the compiler can only make assumptions about the code cache, and usually can order the code in a way that one branch will fit the cache.

Without any testing, I'dexpect a) the only case where such an optimization would be applied, becasue it's nto always the better choice:

In which cases splitting the loop would be bad?

When splitting the loop increases code size beyond the code cache, you will take a significant hit. Now, that only affects you if the loop itself is called within another loop, but that's something the compiler usually can't determine.

[edit]
I couldn't get VC9 to split the following loop (one of the few cases where it might actually be beneficial)

extern volatile int vflag = 0;

int foo(int count)
{
   int sum = 0;
   int flag = vflag;
   for(int i=0; i<count; ++i)
   {
      if (flag)
         sum += i;
      else
         sum -= i;
   }

   return sum;
}

[edit 2]
note that with int flag = true; the second branch does get optimized away. (and no, const doesn't make a difference here ;))

What does that mean? Either it doesn't support that, it doesn't matter, ro my analysis is wrong ;-)

Generally, I'd asume this is an optimization that is valuable only in a very few cases, and can be done by hand easily in most scenarios.

peterchen
Peter, what flags did you use to compile your code snippet?
Michael
no debug info, always inline (/Ob2), /Ox with either /Os, /Ot or none of the latter two. (I've looked at the compilers output only, but /GL shouldn't affect this)
peterchen
+1  A: 

As many have said: it depends.

If you want to be sure, you should try to force a compile-time decision. Templates often come in handy for this:

for (condition)
  do_it<flag>();
sbi
+6  A: 

Tried with GCC and -O3:

void foo();
void bar();

int main()
{
    bool doesnt_change = true;
    for (int i = 0; i != 3; ++i) {
        if (doesnt_change) {
            foo();
        }
        else {
            bar();
        }
    }
}

Result for main:

_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call ___main
call __Z3foov
call __Z3foov
call __Z3foov
xorl %eax, %eax
leave
ret

So it does optimize away the choice (and unrolls smaller loops).

This optimization is not done if doesnt_change is global.

UncleBens
+1: For performing the test and showing the generated code.
Clifford
could you try the snippet I used below? In your snippet the compiler is simply removing the second branch because it will never execute. (the trick is to initialize `doesnt_change` from an extern volatile, so the compiler can't determine which value it will have)
peterchen