tags:

views:

141

answers:

1

Is there some instruction in MIPS that will determine the parity of a certain bit representation? I know to determine whether a "number" has an even parity or an odd parity is to XOR the individual bits of the binary representation together, but that seems computationally-intensive for a set of MIPS instructions... and I need to do this as quick as possible.

Also, the number I'm working in is represented in Grey Code... just to throw that in there. So is there some pseudo-instruction in MIPS to determine the parity of a "number" or do I have to do it by hand?

If there is no MIPS instruction, which it seems very unlikely to be, any advice on how to do it by hand?

Thanks, Hristo

follow-up: I found a optimization, but my implementation isn't working.

unsigned int v; // 32-bit word
v ^= v >> 1;
v ^= v >> 2;
v = (v & 0x11111111U) * 0x11111111U;
return (v >> 28) & 1;
+1  A: 

I'm not aware of any MIPS variant with a parity instruction, but there's a bit-twiddling trick for calculating parity faster than the obvious method of running through each of the 32 bits in turn. In C:

result = in ^ (in >> 16);
result ^= (result >> 8);
result ^= (result >> 4);
result ^= (result >> 2);
result ^= (result >> 1);
result &= 1;
  • After the first step, the bottom 16 bits of the result contain the parity of bits N and N+16 of the input - essentially, 16 steps of the parity calculation have been performed in one go. Writing result{N} to mean "bit N of result":

    result{0}  =  in{0} ^ in{16}
    result{1}  =  in{1} ^ in{17}
    result{2}  =  in{2} ^ in{18}
    ...
    result{7}  =  in{7} ^ in{23}
    result{8}  =  in{8} ^ in{24}
    ...
    result{15} = in{15} ^ in{31}
    

    (and the remaining top 16 bits of result can now be ignored; they serve no useful purpose in the remainder of the calculation).

  • After the second step, the bottom 8 bits of result contain the parity of bits N, N+8, N+16, N+24 of the original input:

    result{0} = result{0} ^ result{8}  =  in{0} ^  in{8} ^ in{16} ^ in{24}
    result{1} = result{1} ^ result{9}  =  in{1} ^  in{9} ^ in{17} ^ in{25}
    ...
    result{7} = result{7} ^ result{15} =  in{7} ^ in{15} ^ in{23} ^ in{31}
    

    (and again, the remaining bits can be ignored from here onwards).

  • ...and so on, until the parity of all the bits of the original input ends up in the bottom bit of result:

    result{0} = in{0} ^ in{1} ^ in{2} ^ ... ^ in{30} ^ in{31}
    

This is easy to translate directly to MIPS assembly; it's 11 instructions:

# input in $a0, output in $v0, $t0 corrupted
srl $t0, $a0, 16
xor $v0, $a0, $t0
srl $t0, $v0, 8
xor $v0, $v0, $t0
srl $t0, $v0, 4
xor $v0, $v0, $t0
srl $t0, $v0, 2
xor $v0, $v0, $t0
srl $t0, $v0, 1
xor $v0, $v0, $t0
and $v0, $v0, 1

A possible improvement might be to use a lookup table. For example, after the first two steps, we have:

    result{0} =  in{0} ^  in{8} ^ in{16} ^ in{24}
    result{1} =  in{1} ^  in{9} ^ in{17} ^ in{25}
    ...
    result{7} =  in{7} ^ in{15} ^ in{23} ^ in{31}

so we could use a 256-byte lookup table at this point. In C:

result = in ^ (in >> 16);
result ^= (result >> 8);
result = lookup_table[result & 0xff];

where lookup_table[n] has been precalculated, e.g.:

for (i = 0; i < 256; i++) {
    n = i ^ (i >> 4);
    n ^= (n >> 2);
    n ^= (n >> 1);
    lookup_table[i] = n & 1;
}

This is 7 MIPS instructions, not counting loading the lookup table base address into a register:

# input in $a0, lookup table address in $a1, output in $v0, $t0 corrupted
srl  $t0, $a0, 16
xor  $v0, $a0, $t0
srl  $t0, $v0, 8
xor  $v0, $v0, $t0
andi $v0, $v0, 0xff
addu $t0, $a1, $v0
lbu  $v0, 0($t0)

However, this is 7 instructions which include a memory access, versus 11 instructions which are purely register operations; it may or may not be faster. (This sort of micro-optimisation always needs to be profiled!)

Matthew Slattery
Thanks for the reply.This is what I currently have... but I'm trying to optimize the code to less instructions because this set of instructions will be called many many times and it is slowing down the performance. Any optimization ideas?
Hristo
After the second `xor`, all of the useful information is in the bottom 8 bits; the final answer is then just a function of those bits, so you could use them as an index into a 256-byte lookup table at that point. Whether that would work out faster or not depends on speed of memory access, caching, etc. - you'd have to try it and profile it...
Matthew Slattery
What do you mean by "all the useful information is in the bottom 8 bits"? So you're suggesting shift 16, xor, shift 8, xor and then the right-most 8 bits are the important ones?
Hristo
Yes; I've edited my answer to clarify this.
Matthew Slattery
Ok. Thanks a lot! I've implemented this and its not working, but I'm sure I have something wrong that I can't see.
Hristo
@Hristo: oops, no, that'll be my fault for screwing up the lookup table initialisation. I've corrected it now. Sorry about that.
Matthew Slattery
Thanks for letting me know. I wasn't actually using your look-up table, however the 7 MIPS instructions work well. I have it working now and its pretty quick :) Thanks a bunch!
Hristo