views:

3397

answers:

10

What is the fastest way you know to convert a floating-point number to an int on an x86 CPU. Preferrably in C or assembly (that can be in-lined in C) for any combination of the following:

  • 32/64/80-bit float -> 32/64-bit integer

I'm looking for some technique that is faster than to just let the compiler do it.

A: 

Switch from a Pentium 5 to a chip that does math right... (Man that makes me feel old...)

JBB
I'm rolling around on the ground. Dang -- it's too bad people down-modded you for that!
Kevin
It was worth it. :)
JBB
:) Is there actually a Pentium 5? And if there is, so sorry it does have SSE3 and therefore is perfectly allright. When used wisely (see SSE3 and FISTTP comments).
akauppi
A: 
float foo = 0.0;
int bar = (int)foo;

Or is that what you meant by "let the compiler do it"?

Hank Gay
+8  A: 

Packed conversion using SSE is by far the fastest method, since you can convert multiple values in the same instruction. ffmpeg has a lot of assembly for this (mostly for converting the decoded output of audio to integer samples); check it for some examples.

Dark Shikari
It is a good suggestion however I will caveat it by saying it assumes two things: - That you have an x86 processor with SSE (>PII) or SSE2 (>PIII) - That you in fact do want a truncation, not a rounding, conversion
Burly
+4  A: 

There is one instruction to convert a floating point to an int in assembly: use the FISTP instruction. It pops the value off the floating-point stack, converts it to an integer, and then stores at at the address specified. I don't think there would be a faster way (unless you use extended instruction sets like MMX or SSE, which I am not familiar with).

Another instruction, FIST, leaves the value on the FP stack but I'm not sure it works with quad-word sized destinations.

dreamlax
+3  A: 

If you really care about the speed of this make sure your compiler is generating the FIST instruction. In MSVC you can do this with /QIfist, see this MSDN overview

You can also consider using SSE intrinsics to do the work for you, see this article from Intel: http://softwarecommunity.intel.com/articles/eng/2076.htm

Don Neufeld
+7  A: 

It depends on if you want a truncating conversion or a rounding one and at what precision. By default, C will perform a truncating conversion when you go from float to int. There are FPU instructions that do it but it's not an ANSI C conversion and there are significant caveats to using it (such as knowing the FPU rounding state). Since the answer to your problem is quite complex and depends on some variables you haven't expressed, I recommend this article on the issue:

http://www.stereopsis.com/FPU.html

Burly
A: 

Generally, you can trust the compiler to be efficient and correct. There is usually nothing to be gained by rolling your own functions for something that already exists in the compiler.

You are simply incorrect. In this case rolling your own is a very demonstrable 10x speed improvement over the built in functions because when you do it yourself you can trust the state of the FPU flags, which the built in _ftol does not do, or you can do it parallelized using SSE.
Don Neufeld
Or you can flag '-msse3' (gcc) and have the 'fixed' FTSTTP do it right, seamlessly.
akauppi
+3  A: 
akauppi
+4  A: 

A commonly used trick for plain x86/x87 code is to force mantissa part of the float to represent the int. 32 bit version follows.

64 b version is analogical. LUA version posted above is faster, but relies on the truncation of double to 32b result, therefore requires x87 unit to be set to double precision, and cannot be adapted for double to 64b int conversion.

The nice thing about this code is it is completely portable for all platforms conforming to IEEE 754, the only assumption made is the floating point rounding mode is set to nearest. Note: Portable in the sense it compiles and works. Platforms other than x86 usually do not benefit much from this technique, if anything at all.

static const float Snapper=3<<22;

union UFloatInt {
 int i;
 float f;
};

/** by Vlad Kaipetsky
portable assuming FP24 set to nearest rounding mode
efficient on x86 platform
*/
inline int toInt( float fval )
{
  Assert( fabs(fval)<=0x003fffff ); // only 23 bit values handled
  UFloatInt &fi = *(UFloatInt *)&fval;
  fi.f += Snapper;
  return ( (fi.i)&0x007fffff ) - 0x00400000;
}
Suma
For unsigned integer it can be simpler:inline uint32_t toInt( float fval ){ static float const snapper = 1<<23; fval += snapper; return (*(uint32_t*)fval) }
chmike
+4  A: 

If you can guarantee the CPU running your code is SSE3 compatible (even Pentium 5 is, JBB), you can allow the compiler to use its FISTTP instruction (i.e. -msse3 for gcc). It seems to do the thing like it should always have been done:

http://software.intel.com/en-us/articles/how-to-implement-the-fisttp-streaming-simd-extensions-3-instruction/

Note that FISTTP is different from FISTP (that has its problems, causing the slowness). It comes as part of SSE3 but is actually (the only) X87-side refinement.

Other then X86 CPU's would probably do the conversion just fine, anyways. :)

Processors with SSE3 support

akauppi