views:

523

answers:

6

I am writing a function to process an incoming 32-bit buffer, representing changing data when it is compared to an corresponding stored 32-bit buffer. The position of the changing bit represents a number (i.e. a value of 8 means bit 3) that needs to be processed, along with whether the change is 0->1, or 1->0. Here is the current implementation, please help me improve it! Note that this is not the actual code, it has been simplified to be context-neutral.

uint32_t temp = oldBuffer ^ newBuffer;
uint32_t number = 0;
while (temp != 0)
{
    if (temp & 0x1)
    {
        uint32_t bitValue = 0;
        if ((newBuffer& (1 << number)) != 0) bitValue = 1;
        processNumber(number, bitValue);
    }
    number++;
    temp = temp >> 1;
}
oldBuffer = newBuffer;

Right now it works but I don't like that it has to check each bit by checking bit 1 and shifting through the entire thing. If there was guarantee to be only 1 bit set that wouldn't be too hard to figure out, but that's not the case.

Edit: To Neil, I guess I am hoping to find a way to get the positions of the bits after the XOR in constant time, instead of shifting all the way through the buffer and checking the bits one by one.

A: 
 uint32_t temp = oldBuffer ^ newBuffer;
 uint32_t number = 0;
 uint32_t bitmask=1;
 while (temp != 0)
 {
     if (temp & 0x1)
     {
         processNumber(number, ((newBuffer & bitmask) != 0));
     }
     number++;
     temp = temp >> 1;
     bitmask <<=1;
 }
 oldBuffer = newBuffer;

2 super small changes...

your code was already quite efficient

Toad
+8  A: 
uint32_t temp=oldBuffer^newBuffer, ntemp=newBuffer;
for(int b=0;temp;++b,temp>>=1,ntemp>>=1)
    if(temp&1) processNumber(b,ntemp&1);
Michael Krelin - hacker
+1, Some spaces would go a long way to improving the readability ;D
sixlettervariables
Nice - one (extremely) small thing: the test for `b<32` is unnecessary. Stopping the loop when `test` goes to 0 is enough (dunno if a compiler would be smart enough to figure that out at optimization time).
Michael Burr
Lou Franco
Michael Krelin - hacker
Lou, the number of shifts is the same, but the amount of bits differs.
Michael Krelin - hacker
Shifting 1 and shifting N should be equivalent -- both are just hardwired into the silicon, right? Is there a fast shiftRightOne instruction? I think if there is improvement, it's in not hoping the optimizer will deal with the bitValue variable. Anyway, the code is simpler -- so, in that way -- definitely an improvement.
Lou Franco
Lou may well have somethign there, but we sure are delving into some micro-optimizations (but sometimes that's just plain good recreation, even if it's not great engineering).
Michael Burr
Lou, it depends on the architecture, but in general you're right. I do not think that the original code needs any optimization for performance reasons. But I do understand why Rich's aesthetics are hurt when looking at the `1<<number` code ;-) So, basically, you original comment was correct — it's not much of a performance boost.
Michael Krelin - hacker
+4  A: 

You could possibly gain some performance by using one or more of the 'bit twiddling' hacks on

Specifically, the algorithms to 'find the integer log base 2' (posisiton of the highest bit set). This would let you determine which bits are set more directly than looping over each bit.

If you must process the bits in in low to high order, you can use a slight modification of Kernighan's methods to counts bits:

/* note: untested code */
while (temp) {

    uint32_t bit = temp & (~(temp & (temp - 1)); /* isolate lowest bit */

    temp &= ~bit;

    uint32_t bit_number = /* use find log base 2 hack */;

    /* etc... */

}

This should make the while loop iterate exactly the number of times equal to the number of set bits. Your original loop will iterate a number of times equal to the bit position of the highest set bit.

However, I'd be astonished if this made any measurable difference unless this was a super-critical bit of code.

Michael Burr
+1  A: 

It kind of depends on what you expect the distribution of (oldBuffer ^ newBuffer) will be. If it's totally random and the full range of 32 bits, then you have an average of 16 loops.

One possible solution is to make a table like this

int lookup[255][8] = {
  { -1, -1, -1, -1, -1, -1, -1, -1 }, // 0 has no bits set
  {  0, -1, -1, -1, -1, -1, -1, -1 }, // 1 has only the 0th bit set
  {  1, -1, -1, -1, -1, -1, -1, -1 }, // 2 has only the 1st bit set
  {  0,  1, -1, -1, -1, -1, -1, -1 }, // 3 has the 0th, 1st bit set
  {  2, -1, -1, -1, -1, -1, -1, -1 }, // 4 has only the 2nd bit set
  ...
  {  0,  1,  2,  3,  4,  5,  6,  7 }, // 255 has all bits set
}

With this you have to loop 4 times (1 for each byte) and then 1 time for each bit that is set (average of 4) -- Hey, that's 16.

But if the number of set bits is low (averages a lot less than half the 32 bits), then the table lookup will go down.

Unfortunately, table lookup adds a multiply and add each time you use it, so it's not necessarily good. You have to test it. In other words, it finds the set bits in constant time, but the constant might be larger than the loop. It depends on how many set bits you expect to have.

Lou Franco
Definitely an interesting approach. Now maybe I'ld throw some code at it to generate the lookup table, because I'm not going to be the one filling in the dots in the middle :)
xtofl
Definitely -- generate the table. It's not only tedious, but error-prone.
Lou Franco
Was about to suggest this...nice one
Justicle
+2  A: 

How about using the standard library? No need to shift, or, and etc... to test for bits to be true. Testing for bits in a bitset is guaranteed to be constant time. And it writes a lot cleaner and more understandeable.

const std::bitset<32> oldbits( oldBuffer );
const std::bitset<32> newbits ( newBuffer );

for( size_t index = 0; index != oldbits.size(); ++index ) {
   if( oldbits[ index ] != newbits[ index ] ) {
       processNumber( index, newbits[ index ] )
   }
}

Note: you don't need the XOR here, either, since you have direct access to the bits. You may use it, though, to save performance.

xtofl
A: 

You can do it recursive B-Tree-like :)

go(oldBuffer ^ newBuffer, 16, 0, newBuffer);
...

void go(temp, half, pos, bitValue)
{
    if (half > 1) {
      uint32_t mask = (1 << half) - 1;
      if (temp & mask)    go(temp & mask, half/2, pos, bitValue & mask);
      temp >>= half;
      if (temp & mask)    go(temp & mask, half/2, pos + half, (bitValue >> half) & mask);
    } else {
      if (temp & 1) processNumber(pos, bitValue&1);
      if (temp & 2) processNumber(pos+1, bitValue/2&1);
    }
}
zxcat