+7  A: 

Here's an explanation

sepp2k
Oh wow. At last I figured out why Carmack earned his credit as a programmer.
Pavel Shved
http://www.jakevoytko.com/blog/2008/01/28/quake-3s-fast-square-root-function/
Kenneth Reitz
+10  A: 

FYI. Carmack didn't write it. Terje Mathisen and Gary Tarolli both take partial (and very modest) credit for it, as well as crediting some other sources.

How the mythical constant was derived is something of a mystery.

To quote Gary Tarolli:

which actually is doing a floating point computation in integer - it took a long time to figure out how and why this works, and I can't remember the details anymore.

A slightly better constant, developed by an expert mathematician (Chris Lomont) trying to work out how the original algorithm worked is:

float InvSqrt(float x)
{
  float xhalf = 0.5f*x;
  int i = *(int*)&x; // get bits for floating value
  i = 0x5f375a86- (i>>1); // gives initial guess y0
  x = *(float*)&i; // convert bits back to float
  x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
  return x;
}

In spite of this, his initial attempt a mathematically 'superior' version of id's sqrt (which came to almost the same constant) proved inferior to the one initially developed by Gary despite being mathmetically much 'purer'. He couldn't explain why id's was so excellent iirc.

Rushyo
+1 thanks for the background info!
Alex
+4  A: 

According to this nice article written a while back...

The magic of the code, even if you can't follow it, stands out as the i = 0x5f3759df - (i>>1); line. Simplified, Newton-Raphson is an approximation that starts off with a guess and refines it with iteration. Taking advantage of the nature of 32-bit x86 processors, i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast. i is then set to 0x5f3759df, minus itself shifted one bit to the right. The right shift drops the least significant bit of i, essentially halving it.

It's a really good read. This is only a tiny piece of it.

Dillie-O
Well there ya go... two posts in 10 seconds. Behold the power of the SO!!!
Dillie-O
+9  A: 

Of course these days, it turns out to be much slower than just using an FPU's sqrt (especially on 360/PS3), because swapping between float and int registers induces a load-hit-store, while the floating point unit can do reciprocal square root in hardware.

It just shows how optimizations have to evolve as the nature of underlying hardware changes.

Crashworks