views:

1510

answers:

7

What are the steps in the algorithm to do floating point division?

Why is the result slower than say, multiplication?

Is it done the same way we do division by hand? By repeatedly dividing by the divisor, subtracting the result to obtain a remainder, aligning the number again and continuing till the remainder is less than a particular value?

Also, why do we gain on performance if instead of doing

a = b / c

we do

d = 1 / c
a = b * d

?

Edit: Basically I was asking because someone asked me to distribute a value among contenders based on the assignment of weights. I did all this in integers and was later asked to convert to float, which caused a slowdown in performance. I was just interested in knowing how would C or C++ do these operations that would cause the slowness.

+2  A: 

You won't gain performance by doing

d = 1 / c
a = b * d

You probably mean:

d = 1 / c
a1 = b1 * d
a2 = b2 * d

This way the division is done only once.

Division is per se slower than multiplication, however, i don't know the details. The basic reason is that, similar to functions such as sin or sqrt, it's just mathematically more complex. Iirc, a multiplication takes about 10 cycles on an average CPU, while a division takes about 50 or more.

Edit: How it is actually done was nicely explained by John Mulder.

mafutrct
I've always wondered why, though; one's adding in a loop and the other's subtracting. Maybe it's the extra test for while(x>=y) in the loop, instead of a straight iteration of n... maybe it's figuring out the decimal representation of the final remainder as a fraction???
Software Monkey
In the case where "c" is a constant, you can indeed save time by multiplying by the reciprocal.
Nosredna
@Software Monkey, multiplication in the ALU is done using shift-add, rather than adding n copies of the number being multiplied, so it only takes k iterations, where k is the number of bits in your number: http://en.wikipedia.org/wiki/Multiplication_ALU
Mike Houston
+1  A: 

Think of the hardware involved, and you'll understand a lot better why it takes so much longer to divide than multiply. Both operations are done down at the Floating Point Unit (FPU) level, and even in the world of integral ALUs, the division circuit is a far busier place than a multiplication circuit. I would suspect this is only more painful in the world of floating point, as now the data isn't just least to most significant digit ordered, but is instead ordered by the IEEE 754 standard.

As for the round off, it's really about wherever the signals traveling between the gates get soldered to ground; where that happens, you lose digits. Not rounding, so much as truncation.

Or were you asking about simulating floating point arithmetic using just integers?

Ed Carrel
+5  A: 

As described in this wikipedia article, there are two main aproaches to division in computers:

Slow Division
Uses the following recurrence and finds one digit per iteration:
partialRemainder[j+1] = radix * partialRemainder[j] - quotientDigit[n-(j+1)]*denominator

Fast Division
Starts with an estimation and converges on the quotient. How accurate you are depends on the number of iterations.
Newton-Raphson division (very briefly):
1. calculate estimate of reciprocal
2. compute more accurate estimates of reciprocal
3. compute quotient by multiplying the dividend by reciprocal

John Mulder
+13  A: 

FPU division often basically uses Newton-Raphson (or some other algorithm) to get a reciprocal then multiplies by that reciprocal. That's why the reciprocal operation is slightly faster than the general division operation.

This HP paper (which is actually more understandable than most papers I come across talking about Newton-Raphson) has this to say about floating point division:

Floating point division and square root take considerably longer to compute than addition and multiplication. The latter two are computed directly while the former are usually computed with an iterative algorithm. The most common approach is to use a division-free Newton-Raphson iteration to get an approximation to the reciprocal of the denominator (division) or the reciprocal square root, and then multiply by the numerator (division) or input argument (square root).

Michael Burr
I'm not saying the HP paper is wrong, I don't know enough to even consider that. However, the paper was last revised in 1994, so I'd guess process design has changed at least _some_ in 14 years :-) So take the paper w/a grain of salt.
WaldenL
A: 

Basically i was asking because someone asked me to distribute a value among contenders based on the assignment of weights. i did all this in integers and was later asked to convert to float, which caused a slowdown in performance. i was just interested in knowing how would C or C++ do these operations that would cause the slowness.

I suggest to add this to your question. Prefix the text with "[EDIT]" so it stands out as "edit after the question as asked".
Aaron Digulla
+6  A: 

From a hardware point of view division is a iterative algorithm, and the time it takes is proportional to the number of bits. The fastest division that is currently around uses the radix4 algorithm which generates 4 bit of result per iteration. For a 32 bit divide you need 8 steps at least.

Multiplication can be done in parallel to a certain degree. Without going into detail you can break up a large multiplication into several smaller, independent ones. These multiplications can again be broken down until you're at a bit-level, or you stop earlier and use a small lookup-table in hardware. This makes the multiplication hardware heavy from a silicon real estate point of view but very fast as well. It's the classic size/speed tradeoff.

You need log2 steps to combine the parallel computed results, so a 32 bit multiply need 5 logical steps (if you go down to the minimum). Fortunately these 5 steps are a good deal simpler than the division steps (it's just additions). That means in practice multiplies are even faster.

Nils Pipenbrinck
A: 

Float division is not much slower than integer division, but the compiler may be unable to do the same optimizations.

For example the compiler can replace integer division between 3 with a multiplication and a binary shift. Also it can replace float division between 2.0 with a multipliation by 0.5 but it cannot replace division between 3.0 with a multiplication by 1/3.0 as 1/3.0 cannot be represented exactly using binary numbers, therefore rounding errors may change the result of the division.
As the compiler doesn't know how sensitive is your application to rounding errors (say you were doing a weather simulation, see Butterfly effect )it cannot do the optimization.

ggf31416