Im just wondering how good the MSVC++ Compiler can optimize code(with Code examples) or what he can't optimize and why.
For example i used the SSE-intrinsics with something like this(var is an __m128 value)(it was for an frustrum-culling test):
if( var.m128_f32[0] > 0.0f && var.m128_f32[1] > 0.0f && var.m128_f32[2] > 0.0f && var.m128_f32[3] > 0.0f ) {
...
}
As i took a look at the asm-output i saw that it did compile to an ugly very jumpy version (and i know that the CPU's just hate tight jumps) and i know also that i can optimize it with the SSE4.1 PTEST instruction, but why did the compiler not do it(even if the compiler writers defined the PTEST intrinsic, so they knew the instruction)?
What optimizations can't it do too (until now).
Does this imply that im with the todays technology forced to use intrinsics and inline ASM and linked ASM functions and will compilers ever find such things(i don't think so)?
Where can i read more about how good the MSVC++ compiler optimizes?
(Edit 1): I used the SSE2 switch and FP:fast switch