ansaurus

Question

Why is floor() so slow?!

Answer 1

+2 A:

They do not do the same thing. floor() is a function. Therefore, using it incurs a function call, allocating a stack frame, copying of parameters and retrieving the result. Casting is not a function call, so it uses faster mechanisms (I believe that it may use registers to process the values).
Probably floor() is already optimized.
Can you squeeze more performance out of your algorithm? Maybe switching rows and columns may help? Can you cache common values? Are all your compiler's optimizations on? Can you switch an operating system? a compiler? Jon Bentley's Programming Pearls has a great review of possible optimizations.

Yuval F 2009-05-05 10:00:55

Answer 2

+11 A:

A couple of things make floor slower than a cast and prevent vectorization.

The most important one:

floor can modify the global state. If you pass a value that is to huge to be represented as an integer in float format, the errno variable gets set to EDOM. Special handling for NaNs is done as well. All this behaviour is for applications that want to detect the overflow case and handle the situation somehow (don't ask me how).

Detecting these problematic conditions is not simple and makes up more than 90% of the excution time of floor. The actual rounding is cheap and could be inlined/vectorized. Also It's a lot of code, so inlining the whole floor-function would make your program run slower.

Some compilers have special compiler flags that allow the compiler to optimize away some of the rarely used c-standard rules. For example GCC can be told that you're not interested in errno at all. To do so pass -fno-math-errno or -ffast-math. ICC and VC may have similar compiler flags.

Btw - You can roll your own floor-function using simple casts. You just have to handle the negative and positive cases differently. That may be a lot faster if you don't need the special handling of overflows and NaNs.

Nils Pipenbrinck 2009-05-05 10:17:43

Answer 3

A:

Yes, floor() is extremely slow on all platforms since it has to implement a lot of behaviour from the IEEE fp spec. You can't really use it in inner loops.

I sometimes use a macro to approximate floor():

#define PSEUDO_FLOOR( V ) ((V) >= 0 ? (int)(V) : (int)((V) - 1))

It does not behave exactly as floor(): for example, floor(-1) == -1 but PSEUDO_FLOOR(-1) == -2, but it's close enough for most uses.

jrgc 2009-05-05 17:04:27

Naive implementation. PSEUDO_FLOOR( x++ ) would break this.

Charlie Somerville 2010-01-24 08:44:50

ansaurus

tags:

views:

answers:

Why is floor() so slow?!

related questions