views:

438

answers:

6

I have this C-code to do multiplications over GF(8):

int32_t GaloisMultiply (int32_t a, int32_t b) 
{
  int32_t i;
  int32_t mask = 0x100;
  int32_t y = 0;

  for(i=0;i<8;i++) 
  {
    if(b & mask) 
    {
   y ^= a;
 }
    mask >>= 1;
    y <<= 1;
  }

  if(b & 0x1) 
  {
    y ^= a;
  }

  return(y);
}

That's more or less the text-book implementation.

I wonder if I there is a clever optimization for above algorithm if I can assert that a is always b, e.g. I do squaring instead of multiplication. I'm not after a cryptographic use btw. I just want to make use of the fact that x*x in GF(8) interleaves the bits of x with zero bits one by one.

There are already quite clever methods to do the bit interleaving, but since I've found out that x*x in GF(8) does the bit interleaving thing (by accident) I can't stop trying to use it for bit-interleaving optimizations.

Any ideas?

A: 

You could probably write some assembly to do a slightly better job. However, I'd be pretty surprised if this was the bottleneck in your application; have you done any profiling? This function doesn't seem like it's worth optimizing.

Adam Rosenfield
It indeed is a bottleneck. If I run the code on a different architecture that does the multiplication in hardware I get up to 30% speed-up.
Nils Pipenbrinck
A: 

This is probably not what you are looking for, but here's one minor speedup:

Pass only one argument, if they are guaranteed to be the same.

AShelly
+3  A: 

Table-based? link

And when you are limited to x*x, it's a sparse matrix.

Here's another good paper (and a library)

Cade Roux
But I'm sorry, I don't think they are using the squaring effect (bit interleaving) you are talking about explicitly (or whether taking that into account really improves performance).
Cade Roux
The table based method should be way faster whether or not you are squaring
1800 INFORMATION
The table based approach does the trick. Thanks.
Nils Pipenbrinck
I'm confused. Couldn't you use a table based approach to bit interleaving 8-bit numbers without knowing a Galois Field from a football stadium? What am I missing?
Doug McClean
A: 

It might help the compiler a bit to mark "a" and "b" as const. Or unrolling the loop by hand. It would be sad if it helped, though...

Isn't it a patent minefield, by the way ?

rlerallut
+1  A: 
int32_t GaloisMultiply( int32_t a ) 
{
  int32_t y = 0;
  int32_t b = a & 0x01ff;

  while ( b ) 
  {
    if ( b & 1 ) 
      y ^= a;

    a <<= 1;
    b >>= 1;
  }
  return y;
}

Or if you like:

int32_t GaloisMultiply( int32_t a ) 
{
  int32_t y = 0;
  for ( int32_t b = a & 0x01ff; b; b >>= 1 )
  {
    if ( b & 1 ) 
      y ^= a;

    a <<= 1;
  }
  return y;
}

The reason that this approach is more efficient than the original code above is primarily because the loop is only performed until all the 'interesting' bits in the argument are consumed as opposed to blindly checking all (9) bits.

A table based approach will be faster though.

You should add some detail as to why this would be faster
1800 INFORMATION
thanks, after having a look that seems to be as short as it can get.I'll use tables though. It's faster.
Nils Pipenbrinck
+1  A: 

Lookup table is definitely the fastest for polynomial basis galois squaring. It is also the fastest for multiplication when using GF(8), but the tables get too large for larger fields as used in ECC. For multiplication in larger fields, the best algorithm is the 'left to right combine' method...(see http://www.amazon.com/Elliptic-Cryptography-Springer-Professional-Computing/dp/038795273X algorithm 2.36, page 50).