views:

432

answers:

3

Searching online, I have found the following routine for calculating the sign of a float in IEEE format. This could easily be extended to a double, too.

// returns 1.0f for positive floats, -1.0f for negative floats, 0.0f for zero
inline float fast_sign(float f) {
    if (((int&)f & 0x7FFFFFFF)==0) return 0.f; // test exponent & mantissa bits: is input zero?
    else {
        float r = 1.0f;
        (int&)r |= ((int&)f & 0x80000000); // mask sign bit in f, set it in r if necessary
        return r;
    }
}

(Source: ``Fast sign for 32 bit floats'', Peter Schoffhauzer)

I am weary to use this routine, though, because of the bit binary operations. I need my code to work on machines with different byte orders, but I am not sure how much of this the IEEE standard specifies, as I couldn't find the most recent version, published this year. Can someone tell me if this will work, regardless of the byte order of the machine?

Thanks, Patrick

+7  A: 

How do you think fabs() and fabsf() are implemented on your system, or for that matter comparisons with a constant 0? If it's not by bitwise ops, it's quite possibly because the compiler writers don't think that would be any faster.

The portability problems with this code are:

  1. float and int might not have the same endianness or even the same size. Hence also, the masks could be wrong.
  2. float might not be IEEE representation
  3. You break strict aliasing rules. The compiler is allowed to assume that a pointer/reference to a float and a pointer/reference to an int cannot point to the same memory location. So for example, the standard does not guarantee that r is initialized with 1.0 before it is modified in the following line. It could re-order the operations. This isn't idle speculation, and unlike (1) and (2) it's undefined, not implementation-defined, so you can't necessarily just look it up for your compiler. With enough optimisation, I have seen GCC skip the initialization of float variables which are referenced only through a type-punned pointer.

I would first do the obvious thing and examine the emitted code. Only if that appears dodgy is it worth thinking about doing anything else. I don't have any particular reason to think that I know more about the bitwise representation of floats than my compiler does ;-)

inline float fast_sign(float f) {
    if (f > 0) return 1;
    return (f == 0) ? 0 : -1;
    // or some permutation of the order of the 3 cases
}

[Edit: actually, GCC does make something of a meal of that even with -O3. The emitted code isn't necessarily slow, but it does use floating point ops so it's not clear that it's fast. So the next step is to benchmark, test whether the alternative is faster on any compiler you can lay your hands on, and if so make it something that people porting your code can enable with a #define or whatever, according to the results of their own benchmark.]

Steve Jessop
I am assuming IEEE representation, that isn't a major problem.Other than that, thanks. I'll see if there is any more input, but if not, thanks!
Patrick Niedzielski
"float and int might not have the same endianness": on the same system? Can you give an example?
Rick Regan
@Rick: Is ARM soft float little-endian or natural-endian? I can't remember. Anyway the question says that the code can easily be extended to double, and the ARM FPA unit's doubles are neither little-endian *nor* big-endian, so questioner would definitely be out of luck if using an ABI with that double format. The point is just that the standard doesn't forbid it, this issue is tiny compared with the other portability considerations.
Steve Jessop
@Steve: So ARM doubles are mixed endian ("nested" endian, if you will)? Yup, that would mess things up. Thanks for the example!
Rick Regan
@Rick: They can be. ARM has several ABIs corresponding to different floating point hardware. Originally there was no hardware, it was software-only. Then FPA was around for a while with its funny endianness. VFP is a newer floating-point unit which I think is "proper" natural-endian.
Steve Jessop
A: 

So, portability concerns aside, is the speedup really worth it? This seems like premature optimization to me. You should probably focus on improving the asymptotic complexity of your algorithms rather than shaving off little bits of constant-time operations (unless you are invoking sign a very large number of times, have profiled your code, and know for certain that this is actually a major performance bottle neck).

Michael Aaron Safyan
Yes, it is worth it. This is for a 3D mathematics library, in which I use this function a good deal. I wanted to see if this would be faster than the standard C++ fabs() routine.
Patrick Niedzielski
@Patrick, ah ok. Makes sense, then. In that case, you could use a runtime check in your build system to select between the default implementation and the faster implementation, based on whether the fast one works or not for that platform.
Michael Aaron Safyan
@Michael, Yes, thank you. I'm using cmake, so a define in my config.h header should do the trick. I've also a profiler in the code itself, so that should help.
Patrick Niedzielski
+1  A: 

Don't forget that to move a floating point value from an FPU register to an integer register requires a write to RAM followed by a read.

With floating point code, you will always be better off looking at the bigger picture:

Some floating point code
Get sign of floating point value
Some more floating point code

In the above scenario, using the FPU to determine the sign would be quicker as there won't be a write/read overhead1. The Intel FPU can do:

FLDZ
FCOMP

which sets the condition code flags for > 0, < 0 and == 0 and can be used with FCMOVcc.

Inlining the above into well written FPU code will beat any integer bit manipulations and won't lose precision2.

Notes:

  1. The Intel IA32 does have a read-after-write optimisation where it won't wait for the data to be committed to RAM/cache but just use the value directly. It still invalidates the cache though so there's a knock-on effect.
  2. The Intel FPU is 80bits internally, floats are 32 and doubles 64, so converting to float/double to reload as an integer will lose some bits of precision. These are important bits as you're looking for transitions around 0.
Skizz
And don't forget, compilers are rubbish at optimising floating point code. It's not hard to improve on the compiler's FP code and that will have a bigger impact on performance than doing bit-fiddling.
Skizz
Alright, thank you. This seems a bit more elegant than the solution I found online. Using a #define wrapper should help reduce any cross platform problems.
Patrick Niedzielski
Not all Intel systems use the 80 bit x87 unit for floating-point code. On OS X, for example, floating-point is compiled to use the SSE unit by default (except for the `long double` type, which is codegen'd to x87 instructions); on a system that doesn't use the x87 unit for argument passing or returning results, a solution like this would impose considerable overhead vs. a solution that uses the core registers or SSE.
Stephen Canon
@Stephen: At first, I thought "Really?", but thinking about it, the SSE isn't stack based so the compiler has a much easier job of allocating registers, i.e. they stay put as opposed to the FPU where their position changes, e.g. FLDZ makes ST(0)->ST(1), ST(1)->ST(2) and so on. On the downside, the instruction set is different on the SSE, it's geared towards SIMD. Also, since Apple control the hardware so tightly, there's never a case where a CPU doesn't have SSE.
Skizz
@Skizz: Exactly. All Intel Macs are guaranteed to have SSE and SSE2, so the compiler can use it by default. In addition to the advantages you listed, SSE is quite a bit faster than x87.
Stephen Canon
@Stephen: Out of interest, do the Apple compilers have an option to use the FPU by default?
Skizz
@Skizz: 32 bit executables can be built with `-mno-sse` to codegen on the x87 unit. One has to be careful with 64 bit executables, because the 64-bit ABI requires that floating-point values are passed and returned in SSE registers.
Stephen Canon