views:

799

answers:

4

I am currently writing a fast 32.32 fixed-point math library. I succeeded at making adding, subtraction and multiplication work correctly, but I am quite stuck at division.

A little reminder for those who can't remember: a 32.32 fixed-point number is a number having 32 bits of integer part and 32 bits of fractional part.

The best algorithm I came up with needs 96-bit integer division, which is something compilers usually don't have built-ins for.

Anyway, here it goes:

G = 2^32

notation: x is the 64-bit fixed-point number, x1 is its low nibble and x2 is its high

G*(a/b) = ((a1 + a2*G) / (b1 + b2*G))*G      // Decompose this

G*(a/b) = (a1*G) / (b1*G + b2) + (a2*G*G) / (b1*G + b2)

As you can see, the (a2*G*G) is guaranteed to be larger than the regular 64-bit integer. If uint128_t's were actually supported by my compiler, I would simply do the following:

((uint128_t)x << 32) / y)

Well they aren't and I need a solution. Thank you for your help.

+1  A: 

Better self-adjusting answer:
Forgive the C#-ism of the answer, but the following should work in all cases. There is likely a solution possible that finds the right shifts to use quicker, but I'd have to think much deeper than I can right now. This should be reasonably efficient though:

int upshift = 32;
ulong mask = 0xFFFFFFFF00000000;
ulong mod = x % y;
while ((mod & mask) != 0)
{
     // Current upshift of the remainder would overflow... so adjust
     y >>= 1;
     mask <<= 1;
     upshift--;

     mod = x % y;
}
ulong div = ((x / y) << upshift) + (mod << upshift) / y;

Simple but unsafe answer:
This calculation can cause an overflow in the upshift of the x % y remainder if this remainder has any bits set in the high 32 bits, causing an incorrect answer.

((x / y) << 32) + ((x % y) << 32) / y

The first part uses integer division and gives you the high bits of the answer (shift them back up).

The second part calculates the low bits from the remainder of the high-bit division (the bit that could not be divided any further), shifted up and then divided.

jerryjvl
+6  A: 

You can decompose a larger division into multiple chunks that do division with less bits. As another poster already mentioned the algorithm can be found in TAOCP from Knuth.

However, no need to buy the book!

There is a code on the hackers delight website that implements the algorithm in C. It's written to do 64 bit unsigned divisions using 32 bit arithmetic only, so you can't directly cut'n'paste the code. To get from 64 to 128 bit you have to widen all types, masks and constans by two e.g. a short becomes a int, a 0xffff becomes 0xffffffffll ect.

After this easy easy change you should be able to do 128bit divisions.

The code is here: http://www.hackersdelight.org/HDcode/divlu.c (may wrap badly in a web-browser due to line-endings. If so just save the code and open it with notepad or so).

Since your largest values only need 96 bit, One of the 64 bit divisions will always return zero, so you can even simplify the code a bit.

Oh - and before I forget this: The code only works with unsigned values. To convert from signed to unsigned divide you can do something like this (pseudo-code style):

fixpoint Divide (fixpoint a, fixpoint b)
{
  // check if the integers are of different sign:
  fixpoint sign_difference = a ^ b; 

  // do unsigned division:
  fixpoint x = unsigned_divide (abs(a), abs(b));

  // if the signs have been different: negate the result.
  if (sign_difference < 0)
  {
     x = -x;
  }

  return x;
}

The website itself is worth checking out as well: http://www.hackersdelight.org/

Hope it helps.

Btw - nice task that you're working on.. Do you mind telling us for what you need the fixedpoint library?


Btw - the ordinary shift and subtract algorithm for division would work as well.

If you target x86 you can implement it using MMX or SSE intrinsics. The algoritm relies only on primitive operations, so it could perform quite fast as well.

Nils Pipenbrinck
A: 

I like Nils' answer, which is probably the best. It's just long division, like we all learned in grade school, except the digits are base 2^32 instead of base 10.

However, you might also consider using Newton's approximation method for division:

  x := x (N + N - N * D * x)

where N is the numerator and D is the demoninator.

This just uses multiplies and adds, which you already have, and it converges very quickly to about 1 ULP of precision. On the other hand, you won't be able to acheive the exact 0.5-ULP answer in all cases.

In any case, the tricky bit is detecting and handling the overflows.

Die in Sente
One problem with using Newton's method here is that you need a higher precision for your intermediate values than you do for the final answer. So this would require 128 bit integer math for the multiply and adds. Quite possible but not trivial.
SPWorley
A: 

Quick -n- dirty.

Do the A/B divide with double precision floating point. This gives you C~=A/B. It's only approximate because of floating point precision and 53 bits of mantissa.

Round off C to a representable number in your fixed point system.

Now compute (again with your fixed point) D=A-C*B. This should have significantly lower magnitude than A.

Repeat , now computing D/B with floating point. Again, round the answer to an integer. Add each division result together as you go. You can stop when your remainder is so small that your floating point divide returns 0 after rounding.

You're still not done. Now you're very close to the answer, but the divisions weren't exact. To finalize, you'll have to do a binary search. Using the (very good) starting estimate, see if increasing it improves the error.. you basically want to bracket the proper answer and keep dividing the range in half with new tests.

Yes, you could do Newton iteration here, but binary search will likely be easier since you need only simple multiplies and adds using your existing 32.32 precision toolkit.

This is not the most efficient method, but it's by far the easiest to code.

SPWorley