ansaurus

Question

weird performance in C++ (VC 2010)

Answer 1

+3 A:

A few things to check out:

You need to check that is actually is the same code. As in, are your inline assembly statements exactly the same as those generated by the compiler? I can see three potential differences (potential because they may be optimised out). The first is the initial setting of val to zero, the second is the extra variable val1 (unlikely since it will most likely just change the constant subtraction of the stack pointer), the third is that your inline assembly version may not put the interim results back into val.
You need to make sure your sample space is large. You didn't mention whether you'd done only one run of each version or a hundred runs but, the more runs, the better, so as to remove the effect of "noise" in your statistics.
An even better measurement would be CPU time rather than elapsed time. Elapsed time is subject to environmental changes (like your virus checker or one of your services deciding to do something at the time you're testing). The large sample space will alleviate, but not necessarily solve, this.

paxdiablo 2010-05-26 02:24:36

Answer 2

+5 A:

I suggest you try different floating-point calculation models supported by the compiler - precise, strict or fast (see /fp option) - with your original code before making any conclusions. I suspect that your original code was compiled with some overly restrictive floating-point model (not followed by your assembly in the second version of the code), which is why the original is much slower.

In other words, if the original model was indeed too restrictive, then you were simply comparing apples to oranges. The two versions didn't really do the same thing, even though it might seem so at the first sight.

Note, for example, that in the first version of the code the intermediate sum is accumulated in a float value. If it was compiled with precise model, the intermediate results would have to be rounded to the precision of float type, even if the variable val was optimized away and the internal FPU register was used instead. In your assembly code you don't bother to round the accumulated result, which is what could have contributed to its better performance.

I'd suggest you compile both versions of the code in /fp:fast mode and see how their performances compare in that case.

AndreyT 2010-05-26 02:34:33

Thanks! I've run my original code in Fast mode and it now runs in 80ms, while the 2nd version still runs at 150ms in Fast mode, so I guess the compiler still knows better :)I've found these #pragma's for MSVC to toggle the float precision per function (doesn't work inside functions):#pragma float_control(precise, off, push)... code here ...#pragma float_control(pop)But more specifically: http://msdn.microsoft.com/en-us/library/45ec64h6(VS.80).aspx

raicuandi 2010-05-26 03:37:21

ansaurus

tags:

views:

answers:

weird performance in C++ (VC 2010)

related questions