views:

103

answers:

1

Okay so I was board and wondered how fast math.h square root was in comparison to the one with the magic number in it (made famous by Quake but made by SGI).

But this has ended up in a world of hurt for me.

I first tried this on the Mac where the math.h would win hands down every time then on Windows where the magic number always won, but I think this is all down to my own noobness.

  1. Compiling on the Mac with "g++ -o sq_root sq_root_test.cpp" when the program ran it takes about 15 seconds to complete. But compiling in VS2005 on release takes a split second. (in fact I had to compile in debug just to get it to show some numbers)

  2. My poor man's benchmarking? is this really stupid? cos I get 0.01 for math.h and 0 for the Magic number. (it cant be that fast can it?)

I don't know if this matters but the Mac is Intel and the PC is AMD. Is the Mac using hardware for math.h sqroot?

I got the fast square root algorithm from http://en.wikipedia.org/wiki/Fast_inverse_square_root

//sq_root_test.cpp

#include <iostream>
#include <math.h>
#include <ctime>


float invSqrt(float x)
{
    union {
        float f;
        int i;
    } tmp;
    tmp.f = x;
    tmp.i = 0x5f3759df - (tmp.i >> 1);
    float y = tmp.f;
    return y * (1.5f - 0.5f * x * y * y);
}

int main() {
    std::clock_t start;// = std::clock();
    std::clock_t end;
    float rootMe;

    int iterations = 999999999;

    // ---

    rootMe = 2.0f;
    start = std::clock();

    std::cout << "Math.h SqRoot: ";

    for (int m = 0; m < iterations; m++) {
        (float)(1.0/sqrt(rootMe));
        rootMe++;
    }

    end = std::clock();

    std::cout << (difftime(end, start)) << std::endl;

    // ---

    std::cout << "Quake SqRoot: ";

    rootMe = 2.0f;
    start = std::clock();

    for (int q = 0; q < iterations; q++) {
        invSqrt(rootMe);
        rootMe++;
    }

    end = std::clock();

    std::cout << (difftime(end, start)) << std::endl;   
}
+5  A: 

There are several problems with your benchmarks. First, your benchmark includes a potentially expensive cast from int to float. If you want to know what a square root costs, you should benchmark square roots, not datatype conversions.

Second, your entire benchmark can be (and is) optimized out by the compiler because it has no observable side effects. You don't use the returned value (or store it in a volatile memory location), so the compiler sees that it can skip the whole thing.

A clue here is that you had to disable optimizations. That means your benchmarking code is broken. Never ever disable optimizations when benchmarking. You want to know which version runs fastest, so you should test it under the conditions it'd actually be used under. If you were to use square roots in performance-sensitive code, you'd enable optimizations, so how it behaves without optimizations is completely irrelevant.

Also, you're not benchmarking the cost of computing a square root, but of the inverse square root. If you want to know which way of computing the square root is fastest, you have to move the 1.0/... division down to the Quake version. (And since division is a pretty expensive operation, this might make a big difference in your results)

Finally, it might be worth pointing out that Carmacks little trick was designed to be fast on 12 year old computers. Once you fix your benchmark, you'll probably find that it's no longer an optimization, because today's CPU's are much faster at computing "real" square roots.

jalf
Thanks, I should have mentioned I'm doing this as I am working with PS2 at Uni. Which is almost as old.
PhilCK
The PS2 is a substantially different architecture from the PC, regardless of age.
dash-tom-bang