ansaurus

Question

Is it still worth trying to create optimizations for sqrt() in C?

Answer 1

+4 A:

Sqrt is basically unchanged on most systems. It's a relatively slow operation, but the total system speeds have improved, so it may not be worth trying to use "tricks".

The decision to optimize it with approximations for the (minor) gains this can achieve are really up to you. Modern hardware has eliminated some of the need for these types of sacrifices (speed vs. precision), but in certain situations, this is still valuable.

I'd use profiling to determine whether this is "still useful".

Reed Copsey 2009-05-28 19:01:35

Answer 2

+3 A:

If you have proven that the call to sqrt() in your code is a bottleneck with a profiler then it may be worth trying to create an optimizated version. Otherwise it's a waste of time.

lothar 2009-05-28 22:10:17

And even then, it would probably be as effective to look at why sqrt() is the bottleneck and modify the algorithm as to modify the sqrt() function.

Jonathan Leffler 2009-05-29 01:48:26

Answer 3

+14 A:

Rule 1: profile before optimizing

Before investing any effort in the belief that you can beat the optimizer, you must profile everything and discover where the bottleneck really lies. In general, it is unlikely that sqrt() itself is your bottleneck.

Rule 2: replace the algorithm before replacing a standard function

Even if sqrt() is the bottleneck, then it is still reasonably likely that there are algorithmic approaches (such as sorting distances by length squared which is easily computed without a call to any math function) that can eliminate the need to call sqrt() in the first place.

What the compiler does for you if you do nothing else

Many modern C compilers are willing to inline CRT functions at higher optimization levels, making the natural expression including calls to sqrt() as fast as it needs to be.

In particular, I checked MinGW gcc v3.4.5 and it replaced a call to sqrt() with inline code that shuffled the FPU state and at the core used the FSQRT instruction. Thanks to the way that the C standard interacts with IEEE 754 floating point, it did have to follow the FSQRT with some code to check for exceptional conditions and a call to the real sqrt() function from the runtime library so that floating point exceptions can be handled by the library as required by the standard.

With sqrt() inline and used in the context of a larger all double expression, the result is as efficient as possible given the constraints of of standards compliance and preservation of full precision.

For this (very common) combination of compiler and target platform and given no knowledge of the use case, this result is pretty good, and the code is clear and maintainable.

In practice, any tricks will make the code less clear, and likely less maintainable. After all, would you rather maintain (-b + sqrt(b*b - 4.*a*c)) / (2*a) or an opaque block of inline assembly and tables?

Also, in practice, you can generally count on the compiler and library authors to take good advantage of your platform's capabilities, and usually to know more than you do about the subtleties of optimizations.

However, on rare occasions, it is possible to do better.

One such occasion is in calculations where you know how much precision you really need and also know that you aren't depending on the the C standard's floating point exception handling and can get along with what the hardware platform supplies instead.

Edit: I rearranged the text a bit to put emphasis on profiling and algorithms as suggested by Jonathan Leffler in comments. Thanks, Jonathan.

Edit2: Fixed precedence typo in the quadratic example spotted by kmm's sharp eyes.

RBerteig 2009-05-29 01:37:06

I'd be tempted to put the last two paragraphs first - otherwise, a good answer.

Jonathan Leffler 2009-05-29 01:50:26

@Jonathan, that is a good idea. I shuffled things a tad, thanks.

RBerteig 2009-05-29 02:07:22

+1, well put. I would add that, as an example of when to optimize this sort of thing, is performing vector normalizations in 3D graphics programs (particular games). In that case, you're very frequently calculating many inverse square roots, and you don't need maximum precision, so it can be well worth it to use a faster (but less accurate) sqrt implementation.

Adam Rosenfield 2009-06-06 16:25:56

+1. Very clear, thank you.You forgot parentheses around 2*a in your calculation of the discriminant.

kmm 2009-08-12 10:21:10

Well spotted, kmm.

RBerteig 2009-08-12 19:48:40

Answer 4

A:

Why not? You probably learn a lot!

peterchen 2009-05-29 10:19:13

Answer 5

A:

I find it very hard to believe that the sqrt function is your application's bottleneck because of the way modern computers are designed. Assuming this isn't a question in reference to some crazy low end processor, you take a tremendous speed hit to access memory outside of your CPU caches, so unless you're algorithm is doing math on a very few numbers (enough that they all basically fit within the L1 and L2 caches) you're not going to notice any speed up from optimizing any of your arithmetic.

Chris Harris 2009-06-06 16:05:50

Answer 6

+1 A:

It is generally safe to assume that the standard library developers are quite clever, and have written performant code. You're unlikely to be able to match them in general.

So the question becomes, do you know something that'll let you do a better job? I'm not asking about special algorithms for computing the square root (the standard library developers knows of these too, and if they were worthwhile in general, they'd have used them already), but do you have any specific information about your use case, that changes the situation?

Do you only need limited precision? If so, you can speed it up compared to the standard library version, which has to be accurate.

Or do you know that your application will always run on a specific type of CPU? Then you can look at how efficient that CPU's sqrt instruction is, and see if there are better alternatives. Of course, the downside to this is that if I run your app on another CPU, your code might turn out slower than the standard sqrt().

Can you make assumptions in your code, that the standard library developers couldn't?

You're unlikely to be able to come up with a better solution to the problem "implement an efficient replacement for the standard library sqrt".

But you might be able to come up with a solution to the problem "implement an efficient square root function for this specific situation".

jalf 2009-06-06 16:19:00

Answer 7

+1 A:

This probably is the fastest method of computing the square root:

float fastsqrt(float val)  {
        union
        {
                int tmp;
                float val;
        } u;
        u.val = val;
        u.tmp -= 1<<23; /* Remove last bit so 1.0 gives 1.0 */
        /* tmp is now an approximation to logbase2(val) */
        u.tmp >>= 1; /* divide by 2 */
        u.tmp += 1<<29; /* add 64 to exponent: (e+127)/2 =(e/2)+63, */
        /* that represents (e/2)-64 but we want e/2 */
        return u.val;
}

wikipedia article

This probably is the fastest method of computing the inverse square root. Assume at most 0.00175228 error.

float InvSqrt (float x)
{
    float xhalf = 0.5f*x;
    int i = *(int*)&x;
    i = 0x5f3759df - (i>>1);
    x = *(float*)&i;
    return x*(1.5f - xhalf*x*x);
}

This is (very roughly) about 4 times faster than (float)(1.0/sqrt(x))

wikipedia article

Zoli 2009-08-12 00:48:39

ansaurus

tags:

views:

answers:

Is it still worth trying to create optimizations for sqrt() in C?

Rule 1: profile before optimizing

Rule 2: replace the algorithm before replacing a standard function

What the compiler does for you if you do nothing else

related questions