views:

243

answers:

2

Hi,

I'm writing a small bignum library for a homework project. I am to implement Karatsuba multiplication, but before that I would like to write a naive multiplication routine.

I'm following a guide written by Paul Zimmerman titled "Modern Computer Arithmetic" which is freely available online.

On page 4, there is a description of an algorithm titled BasecaseMultiply which performs gradeschool multiplication.

I understand step 2, 3, where B^j is a digit shift of 1, j times. But I don't understand step 1 and 3, where we have A*b_j. How is this multiplication meant to be carried out if the bignum multiplication hasn't been defined yet?

Would the operation "*" in this algorithm just be the repeated addition method?

Here is the parts I have written thus far. I have unit tested them so they appear to be correct for the most part:

The structure I use for my bignum is as follows:

#define BIGNUM_DIGITS 2048
typedef uint32_t u_hw; // halfword
typedef uint64_t u_w; // word

typedef struct {
    unsigned int sign; // 0 or 1
    unsigned int n_digits;
    u_hw digits[BIGNUM_DIGITS];
} bn;

Currently available routines:

bn *bn_add(bn *a, bn *b); // returns a+b as a newly allocated bn
void bn_lshift(bn *b, int d); // shifts d digits to the left, retains sign
int bn_cmp(bn *a, bn *b); // returns 1 if a>b, 0 if a=b, -1 if a<b
+1  A: 

I wrote a multiplication algorithm a while ago, and I have this comment at the top. If you have two numbers x and y of the same size (same n_digits) then you would multiply like this to get n, which would have twice the digits. Part of the complexity of the algorithm comes from working out which bits not to multiply if n_digits is not the same for both inputs.

Starting from the right, n0 is x0*y0 and you save off the overflow. Now n1 is the sum of x1*y0 and y1*x0 and the previous overflow shifted by your digit size. If you are using 32 bit digits in 64 bit math, that means n0 = low32(x0*y0) and you carry high32(x0*y0) as the overflow. You can see that if you actually used 32 bit digits you could not add the center columns up without exceeding 64 bits, so you probably use 30 or 31 bit digits.

If you have 30 bits per digit, that means you can multiple two 8 digit numbers together. First write this algorithm to accept two small buffers with n_digits up to 8 and use native math for the arithmetic. Then implement it again, taking arbitrary sized n_digits and using the first version, along with your shift and add method, to multiply 8x8 chunks of digits at a time.

/*
    X*Y = N

                          x0     y3
                            \   /  
                             \ /   
                              X    
                      x1     /|\     y2
                        \   / | \   /  
                         \ /  |  \ /   
                          X   |   X    
                  x2     /|\  |  /|\     y1
                    \   / | \ | / | \   /  
                     \ /  |  \|/  |  \ /   
                      X   |   X   |   X    
              x3     /|\  |  /|\  |  /|\     y0
                \   / | \ | / | \ | / | \   /
                 \ /  |  \|/  |  \|/  |  \ /
                  V   |   X   |   X   |   V
                  |\  |  /|\  |  /|\  |  /|
                  | \ | / | \ | / | \ | / |
                  |  \|/  |  \|/  |  \|/  |
                  |   V   |   X   |   V   |
                  |   |\  |  /|\  |  /|   |
                  |   | \ | / | \ | / |   |
                  |   |  \|/  |  \|/  |   |
                  |   |   V   |   V   |   |
                  |   |   |\  |  /|   |   |
                  |   |   | \ | / |   |   |
                  |   |   |  \|/  |   |   |
                  |   |   |   V   |   |   |
                  |   |   |   |   |   |   |
              n7  n6  n5  n4  n3  n2  n1  n0
*/
drawnonward
I am not sure what you mean by exceeding 64bits with 32-bit digits? As you can see I have 32-bit digits, and was planning to use the 64-bit word size to manage the full range.
nn
Btw - cool diagram, it's really handy with the multiplication.
nn
The basecase multiply avoids adding columns by going one diagonal at a time and building the result as it goes, so you can ignore the issue. If you add the products of two 32 bit multiplies you get a 65 bit value. You either have to handle that overflow (easy in asm, hassle in C) or you have to use fewer bits in each digit.
drawnonward
Incidentally, it is cheaper to do c+=a*b then to just do c=a*b because in the latter you must initialize c to zero.
drawnonward
don't you mean just the inverse ? ie "in the former" you must initialize c to zero, since in the later ('=') you just override c altogether.
Matthieu M.
In this case c is an 8k buffer. The multiplication algorithm adds interim results to arbitrary digits as it runs, so all the digits of c must be initialized. In practice c=a*b is calculated as c=0; c+=a*b; so if c already has a value the add is free.
drawnonward
A: 

To do A*b_j, you need to do the grade school multiplication of a bignum with a single digit. You end up having to add a bunch of two-digit products together:

bn *R = ZERO;
for(int i = 0; i < n; i++) {
  bn S = {0, 2};
  S.digits[0] = a[i] * b_j;
  S.digits[1] = (((u_w)a[i]) * b_j) >> 32;  // order depends on endianness
  bn_lshift(S, i);
  R = bn_add(R, S);
}

Of course, this is very inefficient.

Keith Randall