Im trying to optimize my exercise application in VS2010. Basically I have several sqrt, pow and memset in the core loop. More specifically, this is what I do:
// in a cpp file ...
#include <cmath>
#pragma intrinsic(sqrt, pow, memset)
void Simulator::calculate()
{
for( int i=0; i<NUM; i++ )
{
...
float len = std::sqrt(lenSq);
distrib[0] = std::pow(baseVal, expVal);
...
clearQuad(i); // invokes memset
}
}
After build, the disassembly shows that, for example, the sqrt call still compiles as "call _CIsqrt(0x####)" nothing changes regardless of whether the /Oi flag is enabled or not.
Can anybody kindly explain how can i enable the intrinsic version and how can I verify it by the disassembly code? (I have also enabled the /O2 in the project settings.)
Thank you
Edit: Problem solved by adding /fp:fast. For sqrt, as an example, the intrinsic version uses a single "fsqrt" to replace the std version "call __CIsqrt()". Sadly, in my case, the intrinsic version is 5% slower.
Many thanks to Zan Lynx and mch.