views:

70

answers:

1

Hi, I'm trying to multiply A*B in 16-bit fixed point, while keeping as much accuracy as possible. A is 16-bit in unsigned integer range, B is divided by 1000 and always between 0.001 and 9.999. It's been a while since I dealt with problems like that, so:

  • I know I can just do A*B/1000 after moving to 32-bit variables, then strip back to 16-bit
  • I'd like to make it faster than that
  • I'd like to do all the operations without moving to 32-bit (since I've got 16-bit multiplication only)

Is there any easy way to do that?

Edit: A will be between 0 and 4000, so all possible results are in the 16-bit range too.

Edit: B comes from user, set digit-by-digit in the X.XXX mask, that's why the operation is /1000.

+3  A: 

No, you have to go to 32 bit. In general the product of two 16 bit numbers will always give you a 32 bit wide result.

You should check the CPU instruction set of the CPU you're working on because most multiply instructions on 16 bit machines have an option to return the result as a 32 bit integer directly.

This would help you a lot because:

short testfunction (short a, short b)
{
  int A32 = a;
  int B32 = b;

  return A32*B32/1000
}

Would force the compiler to do a 32bit * 32bit multiply. On your machine this could be very slow or even done in multiple steps using 16bit multiplies only.

A little bit of inline assembly or even better a compiler intrinsic could speed things up a lot.

Here is an example for the Texas Instruments C64x+ DSP which has such intrinsics:

short test (short a, short b) 
{
  int product = _mpy (a,b); // calculates product, returns 32 bit integer
  return product / 1000;
}

Another thought: You're dividing by 1000. Was that constant your choice? It would be much faster to use a power of two as the base for your fixed-point numbers. 1024 is close. Why don't you:

  return (a*b)/1024 

instead? The compiler could optimize this by using a shift right by 10 bits. That ought to be much faster than doing reciprocal multiplication tricks.

Nils Pipenbrinck
Your listed example does *not* force the compiler to do a 32 bit multiply. Since the compiler can trivially prove that both `A32` and `B32` are within the range of a `short`, it can quite happily produce a 16x16 -> 32 multiply from that code.
caf
Caf, you're right, but do you know any compiler that does this?
Nils Pipenbrinck
gcc always generates the correct smaller multiplies. A standard idiom for getting the upper 32 bits of a 32x32 multiply `x*y` is `(int64_t)x*y>>32`.
R..