Sometimes a loop where the CPU spends most of the time has some branch prediction miss (misprediction) very often (near .5 probability.) I've seen a few techniques on very isolated threads but never a list. The ones I know already fix situations where the condition can be turned to a bool and that 0/1 is used in some way to change. Are there other conditional branches that can be avoided?
e.g. (pseudocode)
loop () {
if (in[i] < C )
out[o++] = in[i++]
...
}
Can be rewritten, arguably losing some readability, with something like this:
loop() {
out[o] = in[i] // copy anyway, just don't increment
inc = in[i] < C // increment counters? (0 or 1)
o += inc
i += inc
}
Also I've seen techniques in the wild changing &&
to &
in the conditional in certain contexts escaping my mind right now. I'm a rookie at this level of optimization but it sure feels like there's got to be more.