ansaurus

Question

Can C++ compilers optimize "if" statements inside "for" loops?

Answer 1

+8 A:

Yes, if it is determined that flag doesn't change and can't be changed by do_something or do_something_else, it can be pulled outside the loop. I've heard of this called loop hoisting, but Wikipedia has an entry called "loop invariant code motion".

If flags is a local variable, the compiler should be able to do this optimization since it's guaranteed to have no effect on the behavior of the generated code.

If flags is a global variable, and you call functions inside your loop it might not perform the optimization - it may not be able to determine if those functions modify the global.

This can also be affected by the sort of optimization you do - optimizing for size would favor the non-hoisted version while optimizing for speed would probably favor the hoisted version.

In general, this isn't the sort of thing that you should worry about, unless profiling tells you that the function is a hotspot and you see that less than efficient code is actually being generated by going over the assembly the compiler outputs. Micro-optimizations like this you should always just leave to the compiler unless you absolutely have to.

Michael 2009-09-22 21:28:28

Thanks for the answer. From your wikipedia link I've found the page for "loop unswitching", which seems to be an even more precise term for this kind of optimization.

generic-identity 2009-09-22 21:43:40

Although the terms aren't really cut and dried, loop-hoisting and loop-invariant-code-motion are not really used to describe this optimization. They're more for single instructions.

Paul Biggar 2009-09-23 10:00:01

Answer 2

+1 A:

I'm sure if the compiler can determine that the flag will remain constant, it can do some shufflling:

const bool flag = /* ... */;
for (..;..;..;)
{
    if (flag)
    {
        // ...
    }
    else
    {
        // ...
    }
}

If the flag is not const, the compiler cannot necessarily optimize the loop, because it can't be sure flag won't change. It can if it does static analysis, but not all compilers do, I think. const is the sure-fire way of telling the compiler the flag won't change, after that it's up to the compiler.

As usual, profile and find out if it's really a problem.

GMan 2009-09-22 21:29:28

'const' is a compiler-checked condition, but has no effect on optimization.

peterchen 2009-09-22 21:30:33

The scope of the variable is probably more important but the constness can have a bearing on optimization. It is undefined behaviour to change an object that is `const` (this is different from using a `const_cast` to change an object that isn't a `const` obejct, but the reference through which the object is known is a `const` reference) so a compiler *can* use this information to cache its value.

Charles Bailey 2009-09-22 21:42:01

peter, `const` has everything to do with optimization.

GMan 2009-09-22 21:43:55

Probably more important is the object's scope. If flag is a local object and no references or pointers to it are passed to external functions then the compiler can be very free with its actual implementation, even going so far as to eliminate any actual storage for the object. As there is no way for conforming code to know that the compiler has done this - the code works 'as if' the object exists - the optimization is legal.

Charles Bailey 2009-09-22 21:44:29

I don't for this particular case const-ness would generate different code since the compiler can clearly see that flag is read once and never written to.

Michael 2009-09-22 21:46:04

@peter, this isn't quite true. A conformant C++ implementation is at liberty to assume that an object that is inherently `const` will never change, since the Standard defines any attempt to do so (e.g. via `const_cast`) as U.B. So it can certainly take into account `const` when determining whether to optimize.

Pavel Minaev 2009-09-22 21:46:13

Pavel Minaev 2009-09-22 21:47:19

@Pavel: IIRC a discussion (on GotW maybe?) modifying a const value through const_cast is U.B. only because cnstant values can be placed in an read-only segment. In akll other cases, the immutability evaluation would end up the same without affecting results. But I can't dig it up anymore, so that's all speculation.

peterchen 2009-09-22 22:05:12

@peterchen, I think you have cause and effect confused. A compiler can place a `const` object in RO memory precisely *because* it is undefined behaviour to modify a `const` object. Consequently, it can also make the same optimization decisions about a `const` object whether or not it actually does place it in RO memory.

Charles Bailey 2009-09-22 22:18:18

peterchen 2009-09-22 22:22:47

found this - http://www.parashift.com/c++-faq-lite/const-correctness.html#faq-18.14probably the origin of my memory, though I think it was more elaborate.

peterchen 2009-09-22 22:23:34

I'm with Peter on this - const has no effect on optimization. In this case, the compiler can optimize it because it's local. however, if the const object was a reference (via a real reference or by dereferencing a pointer-to-const) and the compiler couldn't determine that it was aliased (assuming there's a function call inside the for loop that allows the aliasing to have an effect)there would be no additional optimization opportunities due to the const qualifier. It's the fact that the object is local that enables the optimization - not that there's a const qualifier on it

Michael Burr 2009-09-22 22:29:57

If we're taking sides on this, I'm with GMan and Pavel. `:)` Look at this: `{const int k=5; for(int i=0; i<k; ++i) g(}` Since `k` is `const` and since modifying `k` inside `g()` would be U.B., the compiler could apply certain optimizations it couldn't apply if it had to assume `k` could change.

sbi 2009-09-22 23:19:51

@sbi: I'll modify my position on this - the const qualifier is far less useful as an optimization tool than is often thought - particularly for non-POD objects. See Sutter's GotW article for some additional details: http://www.gotw.ca/gotw/081.htm

Michael Burr 2009-09-23 05:15:19

Answer 3

A:

It's called a loop invariant and the optimization is called loop invariant code motion and also code hoisting. The fact that it's in a conditional will definitely make the code analysis more complex and the compiler may or may not invert the loop and the conditional depending on how clever the optimizer is.

There is a general answer for any specific case of this kind of question, and that's to compile your program and look at the generated code.

DigitalRoss 2009-09-22 21:35:13

Answer 4

A:

I would be wary to say that it will. Can it guarantee that the value won't be modified by this, or another thread?

That said, the second version of the code is generally more readable and it would probably be the last thing to optimize in a block of code.

patros 2009-09-22 21:35:36

@patros - If it's a local variable, multithreaded doesn't have to come into play. Even if the variable is visible to other threads, the compiler could still perform the optimization - unless you mark as volatile the compiler is not required to refetch at every access and the codegen is valid.

Michael 2009-09-22 21:37:36

@Michael - Agreed, but we don't see the definition of flag in this snippet, so it may not be local. I also agree that the compiler can legally optimize this, just not that it always will. If it will do it is probably a function of variable scope and compiler flags.

patros 2009-09-22 21:46:08

Answer 5

+1 A:

Generally, yes. But there is no guarantee, and the places where the compiler will do it are probably rare.

What most compilers do without a problem is hoisting immutable evaluations out of the loop, e.g. if your condition is

if (a<b) ....

when a and b are not affected by the loop, the comparison will be made once before the loop.

This means if the compiler can determine the condition does not change, the test is cheap and the jump wenll predicted. This in turn means the test itself costs one cycle or no cycle at all (really).

In which cases splitting the loop would be beneficial?

a) a very tight loop where the 1 cycle is a significant cost
b) the entire loop with both parts does not fit the code cache

Now, the compiler can only make assumptions about the code cache, and usually can order the code in a way that one branch will fit the cache.

Without any testing, I'dexpect a) the only case where such an optimization would be applied, becasue it's nto always the better choice:

In which cases splitting the loop would be bad?

When splitting the loop increases code size beyond the code cache, you will take a significant hit. Now, that only affects you if the loop itself is called within another loop, but that's something the compiler usually can't determine.

[edit]
I couldn't get VC9 to split the following loop (one of the few cases where it might actually be beneficial)

extern volatile int vflag = 0;

int foo(int count)
{
   int sum = 0;
   int flag = vflag;
   for(int i=0; i<count; ++i)
   {
      if (flag)
         sum += i;
      else
         sum -= i;
   }

   return sum;
}

[edit 2]
note that with int flag = true; the second branch does get optimized away. (and no, const doesn't make a difference here ;))

What does that mean? Either it doesn't support that, it doesn't matter, ro my analysis is wrong ;-)

Generally, I'd asume this is an optimization that is valuable only in a very few cases, and can be done by hand easily in most scenarios.

peterchen 2009-09-22 21:39:38

Peter, what flags did you use to compile your code snippet?

Michael 2009-09-22 22:04:30

no debug info, always inline (/Ob2), /Ox with either /Os, /Ot or none of the latter two. (I've looked at the compilers output only, but /GL shouldn't affect this)

peterchen 2009-09-22 22:36:45

Answer 6

+1 A:

As many have said: it depends.

If you want to be sure, you should try to force a compile-time decision. Templates often come in handy for this:

for (condition)
  do_it<flag>();

sbi 2009-09-22 21:58:12

Answer 7

+6 A:

Tried with GCC and -O3:

void foo();
void bar();

int main()
{
    bool doesnt_change = true;
    for (int i = 0; i != 3; ++i) {
        if (doesnt_change) {
            foo();
        }
        else {
            bar();
        }
    }
}

Result for main:

_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call ___main
call __Z3foov
call __Z3foov
call __Z3foov
xorl %eax, %eax
leave
ret

So it does optimize away the choice (and unrolls smaller loops).

This optimization is not done if doesnt_change is global.

UncleBens 2009-09-22 22:10:56

+1: For performing the test and showing the generated code.

Clifford 2009-09-22 22:39:43

could you try the snippet I used below? In your snippet the compiler is simply removing the second branch because it will never execute. (the trick is to initialize `doesnt_change` from an extern volatile, so the compiler can't determine which value it will have)

peterchen 2009-09-22 22:57:17

ansaurus

tags:

views:

answers:

Can C++ compilers optimize "if" statements inside "for" loops?

related questions