ansaurus

Question

Floating Point Div/Mul > 30 times slower than Add/Sub?

Answer 1

A:

If you're interested in floating point speed and possible optimizations, read this book: http://www.agner.org/optimize/optimizing_cpp.pdf

also you can check this: http://msdn.microsoft.com/en-us/library/aa289157%28VS.71%29.aspx

Your results could depend on things such as JIT, compilation flags (debug/release, what kind of FP optimizations to perform or allowed instruction set).

Try setting these flags to max optimizations and change your program, so that it surely won't produce overflows or NANs, because they affect the computation speed. (even something like "v += v1; v += v2; v -= v1; v -= v2;" is ok, because it won't be reduced on "strict" or "precise" floating point mode). Also try not to use more variables than you have FP registers.

ruslik 2010-07-20 18:13:46

Answer 2

+1 A:

Multiplication isn't bad. I think it is a few cycles slower than addition, but yes, division is very slow, compared to the others. It takes significantly longer, and unlike the other 3 operations, it is not pipelined.

jalf 2010-07-20 18:59:32

Answer 3

+3 A:

It's possible that C# optimized the division by vx to multiplication by 1 / vx since it knows those values aren't modified during the loop and it can compute the inverses just once up front.

You can do this optimization yourself and time it in C++.

Mark B 2010-07-20 19:04:20

Answer 4

+2 A:

For the float div/mul tests, you're probably getting denormalized values, which are much slower to process that normal floating point values. This isn't an issue for the int tests and would crop up much later for the double tests.

You should be able to add this to the start of the C++ to flush denormals to zero:

_controlfp(_DN_FLUSH, _MCW_DN);

I'm not sure how to do it in C# though (or if it's even possible).

Some more info here: http://stackoverflow.com/questions/2051534/floating-point-math-execution-time

celion 2010-07-20 22:05:54

This solved it. Adding that line to the GenericTest function caused it to execute much more reasonably. 1sec for float adds, 1.3sec for float mul/divs, 1sec for double adds, 1.6sec for double mul/divs.

Chris D. 2010-07-21 12:18:03

I'm still not sure that it's not a flawed benchmark, but at least now it's *less* flawed :)

celion 2010-07-21 19:47:12

Answer 5

A:

I also decided that your C++ was incredibly slow. So I ran it myself. Turns out that actually, you're totally wrong. fail

I replaced your timer (I've no idea what timer you were using, but I don't have one handy) with the Windows High Performance Timer. That thing can do nanoseconds or better. Guess what? Visual Studio says no. I didn't even tweak it for the highest performance. VS can see right through this sort of crap and ellided all of the loops. That's why you should never, ever, ever use this sort of "profiling". Get a professional profiler and come back. Unless 2010 Express is different to 2010 Professional, which I doubt. They mainly differ in IDE features, not raw code performance/optimization.

I'm not even going to bother running your C#.

Edit: This is DEBUG x64 (the previous screen is x86, but I thought I'd do x64 since I am on x64) and I also fixed a minor bug that caused the time to be negative rather than positive. So unless you want to tell me that your release FP on 32bit is a hundred times slower, I think you've screwed up. alt text

One thing I did find curious is that the x86 debug program never terminated on the second float test, i.e., if you did float first, then double, it was double div/mul that failed. If you did double then float, the float div/mul failed. Must be a compiler glitch.

DeadMG 2010-07-20 22:25:16

uhm 0 nanoseconds, are you using any nasa pc?

PoweRoy 2010-07-20 22:29:54

@PoweRoy: Read the post. The point is that the compiler optimized all of it away.

DeadMG 2010-07-20 22:33:35

Seriously, you posted a screenshot instead of actual code?

SoapBox 2010-07-20 22:43:08

@SoapBox: The OP already posted my code. Frankly, I just couldn't be bothered to type the results out.

DeadMG 2010-07-20 23:05:29

@DeadMG How come the first and second timings are different? First one is only showing 0 (and you don't explain why).

PoweRoy 2010-07-21 07:31:38

@PoweRoy: ... Perhaps you could read the post. The second run was in DEBUG mode, i.e., all compiler optimizations disabled and additional overhead for debugging, and it's still a hundred times faster than the OP's time.

DeadMG 2010-07-21 08:58:08

I am using the High-Performance Timer, QueryPerformanceCounter in both cases. I have never had the compiler optimize the loops away, what compiler options did you set?

Chris D. 2010-07-21 12:09:37

@Chris D: All I did was change it to release. That was it. Didn't go through every possible optimization option or even set overall optimization level.

DeadMG 2010-07-21 16:05:02

ansaurus

tags:

views:

answers:

Floating Point Div/Mul > 30 times slower than Add/Sub?

related questions