Firstly, processors have a capability called branch prediction. After a few runs of the loop, the processor will be able to notice that your if
statement always goes one way. (It can even notice regular patterns, like true false true false
.) It will then speculatively execute that branch, and so long as it able to predict correctly, the extra cost of the if
statement is pretty much eliminated. If you think that the user is more likely to choose true
rather than false
, you can even tell this to the gcc compiler (gcc-specific extension).
However, you did mention in one of your comments that you have a 'much more complicated sequence of bools'. I think it is possible that the processor doesn't have the memory to pattern-match all those jumps -- by the time it comes back to the first if
statement, the knowledge of which way that jump went has been displaced from its memory. But we could help it here...
The compiler has the ability to transform loops and if-statements into what it thinks are more optimal forms. E.g. it could possibly transform your code into the form given by schnaader. This is known as loop unswitching. You can help it along by doing Profile-Guided Optimization (PGO), letting the compiler know where the hotspots are. (Note: In gcc, -funswitch-loops
is only turned on at -O3
.)
You should profile your code at the instruction level (VTune would be a good tool for this) to see if the if-statements are really the bottleneck. If they really are, and if by looking at the generated assembly you think the compiler has got it wrong despite PGO, you can try hoisting out the if-statement yourself. Perhaps templated code would make it more convenient:
template<bool B> void innerLoop() {
for (int i=0; i<10000; i++) {
if (B) {
// some stuff..
} else {
// some other stuff..
}
}
}
if (user_set_flag) innerLoop<true>();
else innerLoop<false>();