views:

403

answers:

4

I recently wrote a Vector 3 class, and I submitted my normalize() function for reviewal to a friend. He said it was good, but that I should multiply by the reciprocal where possible because "multiplying is cheaper than dividing" in CPU time.

My question simply is, why is that?

+1  A: 

Here's a post on the Stack talking about this:

http://stackoverflow.com/questions/506237/why-is-float-division-slow/506241

CalebHC
A: 

The CPU operation for (float) division is much more complicated than multiplication. The CPU has to do more. I am far from knowledgeable about hardware, but you can find a lot of info about common division implementation (based on newton-raphson algorithms, for example).

I would also be careful about always using multiplication of the reciprocal instead of division to gain CPU performance: they may not give exactly the same results. This may or may not matter depending on your application.

David Cournapeau
+4  A: 

Think about it in terms of elementary operations that hardware can more easily implement -- add, subtract, shift, compare. Multiplication even in a trivial setup requires fewer such elementary steps -- plus, it afford advances algorithms that are even faster -- see here for example... but hardware generally doesn't take advantage of those (except maybe extremely specialized hardware). For example, as the wikipedia URL says, "Toom–Cook can do a size-N cubed multiplication for the cost of five size-N multiplications" -- that's pretty fast indeed for very large numbers (Fürer's algorithm, a pretty recent development, can do Θ(n ln(n) 2Θ(ln*(n))) -- again, see the wikipedia page and links therefrom).

Division's just intrisically slower, as -- again -- per wikipedia; even the best algorithms (some of which ARE implemented in HW, just because they're nowhere as sophisticated and complex as the very best algorithms for multiplication;-) can't hold a candle to the multiplication ones.

Just to quantify the issue with not-so-huge numbers, here are some results with gmpy, an easy-to-use Python wrapper around GMP, which tends to have pretty good implementations of arithmetic though not necessarily the latest-and-greatest wheezes. On a slow (first-generation;-) Macbook Pro:

$ python -mtimeit -s'import gmpy as g; a=g.mpf(198792823083408); b=g.mpf(7230824083); ib=1.0/b' 'a*ib'
1000000 loops, best of 3: 0.186 usec per loop
$ python -mtimeit -s'import gmpy as g; a=g.mpf(198792823083408); b=g.mpf(7230824083); ib=1.0/b' 'a/b'
1000000 loops, best of 3: 0.276 usec per loop

As you see, even at this small size (number of bits in the numbers), and with libraries optimized by exactly the same speed-obsessed people, multiplication by the reciprocal can save 1/3 of the time that division takes.

It may be only in rare situations that these few nanoseconds are a life-or-death issue, but, when they are, and of course IF you are repeatedly dividing by the same value (to amortize away the 1.0/b operation!), then this knowledge can be a life-saver.

(Much in the same vein -- x*x will often save time compared to x**2 [in languages that have a ** "raise to power" operator, like Python and Fortran] -- and Horner's scheme for polynomial computation is VASTLY preferable to repeated raise-to-power operations!-).

Alex Martelli
+5  A: 

If you think back to grade school, you'll recall that multiplication was harder than addition and division was harder than multiplication. Things aren't any different for the CPU.

Recall also that calculating the reciprocal involves a division, so unless you calculate the reciprocal once and use it three times, you won't see a speed up.

David Norman
+1 for the essential remark about the need to cache the reciprocal.
Thilo
As for grade school, I found (still do) subtraction harder to do than addition, but those two seem to be the same for a CPU. ;-)
Thilo
@Thilo: when you need to perform a subtraction, just negate the second operand and then you can do an "easier" addition instead. :-)
Laurence Gonsalves
just use 10's complement! Ex., 35 - 17; to negate 17, first find 9's complement of 17, which is ...99982 (filling in 9's in all the available MSD's); increment that gives 10's complement for -17, which is...99983; add 00035 + 99983, throw away the roll-over just like the CPU, and, voila! 18! Well, ok, the borrowing thing doesn't seem so bad after all. =P
JustJeff