views:

52

answers:

2

Im trying to optimize my exercise application in VS2010. Basically I have several sqrt, pow and memset in the core loop. More specifically, this is what I do:

// in a cpp file ...
#include <cmath>

#pragma intrinsic(sqrt, pow, memset)
void Simulator::calculate() 
{
  for( int i=0; i<NUM; i++ )
  {
    ...
    float len = std::sqrt(lenSq);
    distrib[0] = std::pow(baseVal, expVal);
    ...
    clearQuad(i); // invokes memset
  }
}

After build, the disassembly shows that, for example, the sqrt call still compiles as "call _CIsqrt(0x####)" nothing changes regardless of whether the /Oi flag is enabled or not.

Can anybody kindly explain how can i enable the intrinsic version and how can I verify it by the disassembly code? (I have also enabled the /O2 in the project settings.)

Thank you

Edit: Problem solved by adding /fp:fast. For sqrt, as an example, the intrinsic version uses a single "fsqrt" to replace the std version "call __CIsqrt()". Sadly, in my case, the intrinsic version is 5% slower.

Many thanks to Zan Lynx and mch.

+2  A: 

The use of the C++ std namespace might be causing the compiler not to use the intrinsics. Try removing std:: from your sqrt, pow, and memset calls.

The MSDN Library documentation for #pragma intrinsic offers up an example for testing if the intrinsic truely is being used: compile with the -FAs flag and look at the resulting .asm file.

Looking at the disassembly in the debugger, as you seem to already be doing, should also show the intrinsic rather than a call.

mch
1)I removed the std:: and still got the same result. 2)If sqrt,pow and memset are provided by the compiler, do i have to include <cmath> at the begining?
Veg
I think cl is pretty permissive about CRT functions always working whether you include a header for them or not. It probably doesn't matter, but is this x86 or x64? Have you tried compiling the file by hand on the command line?
mch
code arch is x86. I haven't tried command line compilation by hand. Besides, i have just tested it without #include: compilation failed. The output says sqrt needs a definition.
Veg
Edit: the /fp:fast options solves the problem. +1 vote for your help. Anyway, I don't even need to write the line "#pragma intrinsic(sqrt, pow, memset)" to enable intrinsic functions? More surprisingly, the intrinsic version is actually a little slower than the std::sqrt in my case.
Veg
Huh, I though /fp:fast was included in /O2. As I understand it, if you use /Oi, you don't need #pragma intrinsic.
mch
+1  A: 

You are compiling to machine code and not to .NET CLR. Right?

If you compile to .NET then the code won't be optimized until it is run through JIT. At that point .NET has its own intrinsics and other things that will happen.

If you are compiling to native machine code, you might want to play with the /arch option and the /fp:fast option.

Zan Lynx
Tested, positive result. The /fp:fast solves it. Thank you
Veg