ansaurus

Question

Fast Multiplication

Answer 1

+1 A:

I'm assuming, based on the micro-controller tag, you don't have a fast integer divide. My answer is also for unsigned values - it will work for signed values, you just have to limit the numbers used in the tricky bit below.

A good start is divide by 2, 4 and 8. These can be done with right shifts of 1, 2 and 3 bits respectively, assuming your CPU has a logical right-shift instruction.

Secondly, dividing by 1 is just keeping the number as-is. That just leaves, 3, 5, 6, 7 and 9.

Tricky bit starts here:

For the other numbers, you can use the fact that a divide can be replaced by a multiply-and-shift.

Let's say you have a 16-bit processor. To divide by N, you multiply by 256/N and shift right 8 bits:

N = 3, multiply by 85
N = 5, multiply by 51
N = 6, multiply by 43
N = 7, multiply by 37
N = 9, multiply by 28

Take the random example of 72 / 5. Multiply 72 by 51 to get 3672 then shift right 8 bits to get 14.

In order for this to work, your numbers that you're using must not overflow the 16 bits. Since your worst case is multiply-by-85, you can handle numbers up to 771.

The reason this works is because a shift-right of 8 bits is the same as dividing by 256, and:

  m * (256 /  n) / 256
= m / (n /  256) / 256
= m /  n *  256  / 256
= m /  n * (256  / 256)
= m /  n

If you have a 32-bit processor, the values and ranges change somewhat, since it's 65536/N:

N = 3, multiply by 21,846, right shift 16 bits, max value roughly 196,600.
N = 5, multiply by 13,108.
N = 6, multiply by 10,923.
N = 7, multiply by  9,363.
N = 9, multiply by  7,282.

Again, let's choose the random 20,000 / 7: 20,000 multiplied by 9,363 is 187,260,000 and, when you right shift that 16 bits, you get 2,857 - the real result is 2,857.

The following test program in C shows the accuracy figures for the values given. It uses signed values so is only good up to about 98,000 but you can see that the largest error is 1 and that it occurs at the low point of 13,110 (only 0.008% error).

#include <stdio.h>
int res[5] = {0};
int low[5] = {-1,-1,-1,-1,-1};
int da[] = {3,5,6,7,9};
int ma[] = {21846,13108,10923,9363,7282};
int main (void) {
    int n, i;
    for (n = 0; n < 98000; n++) {
        for (i = 0; i < sizeof(da)/sizeof(da[0]); i++) {
            int r1 = n / da[i];
            int r2 = (n * ma[i])>>16;
            int dif = abs (r1-r2);
            if (dif >= 5) {
                printf ("%d / %d gives %d and %d\n", n, da[i], r1, r2);
                return 1;
            }
            res[dif]++;
            if (low[dif] == -1) {
                low[dif] = n;
            }
        }
    }
    for (i = 0; i < sizeof(res)/sizeof(res[0]); i++) {
        printf ("Difference of %d: %6d, lowest value was %6d\n", i, res[i], low[i]);
    }
    return 0;
}

This outputs:

Difference of 0: 335874, lowest value was      0
Difference of 1: 154126, lowest value was  13110
Difference of 2:      0, lowest value was     -1
Difference of 3:      0, lowest value was     -1
Difference of 4:      0, lowest value was     -1

paxdiablo 2009-10-21 12:44:52

Dividing by 3, 5, 6, 7, 9 is a real problem coz microprocessor have super scalar architecture and every if is a real problem for it.

Georg 2009-10-21 13:09:45

See the update, @georgethegreat. You don't need selection at all if the numbers are small enough - you can use multiply-and-shift.

paxdiablo 2009-10-21 13:13:05

I do something like that (I have 32 bit processor). But i got some strange artefacts on a video, I was processing. Is there any exceptions when this method gives a wrong results? I made more detailed comment to previous answer - please, read it.

Georg 2009-10-21 13:37:21

No, provided you don't exceed the maximum allowed values (hence overflow), the errors should be extremely small. If you wanted to check the output, you could actually write a small program that used this method *and* real division for every value from 0 to 196,611 to ensure it worked okay.

paxdiablo 2009-10-21 13:56:20

@Georg(e?), I've posted some C code that tests the values (and bumped them up by one each since I'd rounded down when I should have rounded up). It shows how accurate the method is (*very*).

paxdiablo 2009-10-21 14:25:02

Answer 2

+4 A:

If the division instruction on your microcontroller is fast enough, use that. If you need the fractional part of the result, you may be able to use the remainder; on most architectures, the division instruction puts the quotient in one register and the remainder in another.

If your division instruction is not fast enough but the multiplication instruction is, you can use the following technique (and it sounds as if this is the technique you're after). On most architectures, multiplying a 32-bit number by another 32-bit number results in a 64-bit result; the more significant half is stored in one register and the less significant half is stored in the other. You can exploit this by realizing that division by a number n is the same as multiplying by (2^32)/n, then taking the more significant 32 bits of the result. In other words, if you want to divide by 3, you can instead multiply by 0x100000000/3 = 0x55555555, then take the more significant 32 bits of the result.

What you're doing here is really a form of fixed-point arithmetic. Take a look at the Wikipedia article for more information.

Martin B 2009-10-21 12:47:11

Thanks a lot.This microprocessor is Philips PNX1500 (very old - I got it for educational purposes).Processor has very slow division (it has no integer division - only float). For example: I swapped divide operation with a multiplication and get an acceleration at about 2.25 times.Your answer helped me a lot.

Georg 2009-10-21 13:07:54

I just have tried your way. But on the output video (I am processing video) I got strange artefacts (it is black dots on a frame). I used the following array:static UInt32 lookup_for_multiply[10] = {0, 1, 0x80000000, 0x55555555, 0x40000000, 0x33333333, 0x2AAAAAAA, 0x24924924, 0x20000000, 0x1C71C71C};Where I am wrong?

Georg 2009-10-21 13:26:24

OK. I got it. First member should me 0xFFFFFFFF buy not 0x1.

Georg 2009-10-21 13:41:47

Answer 3

+1 A:

A division of an integer by an integer constant can be replaced with a combination of a shift and a multiplication. See this optimization guide for details. Of cource this is useful if it is actully faster on the chip of interest.

sharptooth 2009-10-21 12:48:08

I do not know constant at the compilation stage.

Georg 2009-10-21 13:08:30

But a set of constants is fixed - you can set up an array of pairs and fetch a necessary pair depending on the divisor value in runtime. Or do the same with a switch.

sharptooth 2009-10-21 13:16:31

So. I think that on this processor one integer multiplication works much faster than one multiplication.

Georg 2009-10-21 13:40:28

ansaurus

tags:

views:

answers:

Fast Multiplication

related questions