views:

132

answers:

10

Given an 32 bit int which is known to have at least 2 bits set, is there a way to efficiently clear all except the 2 most significant set bits? i.e. I want to ensure the output has exactly 2 bits set.

What if the input is guaranteed to have only 2 or 3 bits set.?

Examples:

0x2040 -> 0x2040
0x0300 -> 0x0300
0x0109 -> 0x0108
0x5040 -> 0x5000

Benchmarking Results:

Code:

QueryPerformanceFrequency(&freq);
/***********/
value = (base =2)|1;
QueryPerformanceCounter(&start);
for (l=0;l<A_LOT; l++)
{
  //!!value calculation goes here
  junk+=value;    //use result to prevent optimizer removing it.

  //advance to the next 2|3 bit word
  if (value&0x80000000)
  {  if (base&0x80000000)
     {  base=6;
     }
     base*=2;
     value=base|1;
  }
  else
  {  value<<=1;
  }
}
QueryPerformanceCounter(&end);
time = (end.QuadPart - start.QuadPart);
time /= freq.QuadPart;
printf("--------- name\n");
printf("%ld loops took %f sec (%f additional)\n",A_LOT, time, time-baseline);
printf("words /sec = %f Million\n",A_LOT/(time-baseline)/1.0e6); 

Results on using VS2005 default release settings on Core2Duo [email protected] GHz:

--------- BASELINE
1000000 loops took 0.001630 sec
--------- sirgedas
1000000 loops took 0.002479 sec (0.000849 additional)
words /sec = 1178.074206 Million
--------- ashelly
1000000 loops took 0.004640 sec (0.003010 additional)
words /sec = 332.230369 Million
--------- mvds
1000000 loops took 0.005250 sec (0.003620 additional)
words /sec = 276.242030 Million
--------- spender
1000000 loops took 0.009594 sec (0.007964 additional)
words /sec = 125.566361 Million
--------- schnaader
1000000 loops took 0.025680 sec (0.024050 additional)
words /sec = 41.580158 Million
+4  A: 

I'd create a mask in a loop. At the beginning, the mask is 0. Then go from the MSB to the LSB and set each corresponding bit in the mask to 1 until you found 2 set bits. Finally AND the value with this mask.

#include <stdio.h>
#include <stdlib.h>

int clear_bits(int value) {

  unsigned int mask = 0;
  unsigned int act_bit = 0x80000000;
  unsigned int bit_set_count = 0;

  do {
    if ((value & act_bit) == act_bit) bit_set_count++;
    mask = mask | act_bit;
    act_bit >>= 1;
  } while ((act_bit != 0) && (bit_set_count < 2));

  return (value & mask);
}

int main() {
  printf("0x2040 => %X\n", clear_bits(0x2040));
  printf("0x0300 => %X\n", clear_bits(0x0300));
  printf("0x0109 => %X\n", clear_bits(0x0109));
  printf("0x5040 => %X\n", clear_bits(0x5040));
  return 0;
}

This is quite complicated, but should be more efficient as using a for loop over the 32 bits every time (and clear all bits except the 2 most significant set ones). Anyway, be sure to benchmark different ways before using one.

Of course, if memory is not a problem, use a lookup table approach like some recommended - this will be much faster.

schnaader
I'm interested in retaining the topmost 2 bits that are already set. Added examples above.
AShelly
Ah, I see. This is a completely different problem.
schnaader
Edited my answer. Now it's appropriate.
schnaader
this calls for a benchmark! ;-)
mvds
`gcc -O3`, Xeon [email protected] GHz: 3.284 seconds for 100m rounds, vs 1.236 seconds for the tables. (btw output for 100m random rounds is identical)
mvds
+1  A: 

how much memory is available at what latency? I would propose a lookup table ;-)

but seriously: if you would perform this on 100s of numbers, an 8 bit lookup table giving 2 msb and another 8 bit lookup table giving 1 msb may be all you need. Depending on the processor this might beat really counting bits.

For speed, I would create a lookup table mapping an input byte to

M(I)=0 if 1 or 0 bits set

M(I)=B' otherwise, where B' is the value of B with the 2 msb bits set.

Your 32 bit int are 4 input bytes I1 I2 I3 I4.
Lookup M(I1), if nonzero, you're done.
Compare M(I1)==0, if zero, repeat previous step for I2.
Else, lookup I2 in a second lookup table with 1 MSB bits, if nonzero, you're done. Else, repeat previous step for I3.

etc etc. Don't actually loop anything over I1-4 but unroll it fully.

Summing up: 2 lookup tables with 256 entries, 247/256 of cases are resolved with one lookup, approx 8/256 with two lookups, etc.

edit: the tables, for clarity (input, bits table 2 MSB, bits table 1 MSB)

  I    table2    table1
  0  00000000  00000000
  1  00000000  00000001
  2  00000000  00000010
  3  00000011  00000010
  4  00000000  00000100
  5  00000101  00000100
  6  00000110  00000100
  7  00000110  00000100
  8  00000000  00001000
  9  00001001  00001000
 10  00001010  00001000
 11  00001010  00001000
 12  00001100  00001000
 13  00001100  00001000
 14  00001100  00001000
 15  00001100  00001000
 16  00000000  00010000
 17  00010001  00010000
 18  00010010  00010000
 19  00010010  00010000
 20  00010100  00010000
 ..
250  11000000  10000000
251  11000000  10000000
252  11000000  10000000
253  11000000  10000000
254  11000000  10000000
255  11000000  10000000
mvds
I'm targeting low memory platforms. 64K entries would be too many. I could handle 256 entries, but I can't see a good way to stitch together results of multiple lookups.
AShelly
edited the answer to account for platforms with less than `2^32*4` bytes of RAM.
mvds
I don't get it: for the third example, the 1st table would give `[0,1,0,9]`. Then what?
AShelly
added rough implementation. You lookup the first byte, if it has 2 bits set you are done (result will be this byte and 0x000000). If it hasn't, the table lists 0 so you have to check: either the first byte was 0 (0 bits) or has 1 bit set, in which case you have to find the first bit in the second input byte. etc etc.
mvds
added lookup tables for clarity.
mvds
please note, that if you lookup I1, and find a 0, and I1 is not 0, than you know without looking that I1 contains 1 bit and that is your MSB.
mvds
see the implementation below.
mvds
This is an extremely powerful way to handle many otherwise hairy bit-munging problems. This turns your per-bit mask and compare loop into a short sequence of load-compare instructions using only 512 bytes of ROM.
Recurse
A: 

Using a variation of this, I came up with the following:

var orig=56;
var x=orig;
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
Console.WriteLine(orig&~(x>>2));

In c# but should translate easily.

EDIT

I'm not so sure I've answered your question. This takes the highest bit and preserves it and the bit next to it, eg. 101 => 100

spender
Not quite the answer I am looking for - it only works for 2 contiguous bits. It fails on `0x0109 -> 0x0108` for example.
AShelly
A: 

Here's some python that should work:

def bit_play(num):
    bits_set = 0
    upper_mask = 0
    bit_index = 31
    while bit_index >= 0:
        upper_mask |= (1 << bit_index)
        if num & (1 << bit_index) != 0:
            bits_set += 1
            if bits_set == 2:
                num &= upper_mask
                break
        bit_index -= 1
    return num

It makes one pass over the number. It builds a mask of the bits that it crosses so it can mask off the bottom bits as soon as it hits the second-most significant one. As soon as it finds the second bit, it proceeds to clear the lower bits. You should be able to create a mask of the upper bits and &= it in instead of the second while loop. Maybe I'll hack that in and edit the post.

D.Shawley
A: 

I'd also use a table based approach, but I believe one table alone should be sufficient. Take the 4 bit case as an example. If you're input is guaranteed to have 2 or 3 bits, then your output can only be one of 6 values

      0011
      0101
      0110
      1001
      1010
      1100

Put these possible values in an array sorted by size. Starting with the largest, find the first value which is equal to or less than your target value. This is your answer. For the 8 bit version you'll have more possible return values, but still easily less than the maximum possible permutations of 8*7.

public static final int [] MASKS = {
        0x03, //0011
        0x05, //0101
        0x06, //0110
        0x09, //1001
        0x0A, //1010
        0x0C, //1100
};


    for (int i = 0; i < 16; ++i) {
        if (countBits(i) < 2) {
            continue;
        }
        for (int j = MASKS.length - 1; j >= 0; --j) {
            if (MASKS[j] <= i) {
                System.out.println(Integer.toBinaryString(i) + " " + Integer.toBinaryString(MASKS[j]));
                break;
            }
        }
    }
Jherico
A: 

Here's another attempt (no loops, no lookup, no conditionals). This time it works:

var orig=0x109;
var x=orig;
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x = orig & ~(x & ~(x >> 1));
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
var solution=orig & ~(x >> 1);
Console.WriteLine(solution.ToString("X")); //0x108

Could probably be shortened by someone cleverer than me.

spender
A: 

Following up on my previous answer, here's the complete implementation. I think it is as fast as it can get. (sorry for unrolling the whole thing ;-)

#include <stdio.h>
unsigned char bittable1[256];
unsigned char bittable2[256];

unsigned int lookup(unsigned int);
void gentable(void);

int main(int argc,char**argv)
{
    unsigned int challenge = 0x42341223, result;
    gentable();

    if ( argc > 1 ) challenge = atoi(argv[1]);

    result = lookup(challenge);

    printf("%08x --> %08x\n",challenge,result);
}

unsigned int lookup(unsigned int i)
{
    unsigned int ret;

    ret = bittable2[i>>24]<<24; if ( ret ) return ret;
    ret = bittable1[i>>24]<<24;
    if ( !ret )
    {
            ret = bittable2[i>>16]<<16; if ( ret ) return ret;
            ret = bittable1[i>>16]<<16;
            if ( !ret )
            {
                    ret = bittable2[i>>8]<<8; if ( ret ) return ret;
                    ret = bittable1[i>>8]<<8;
                    if ( !ret )
                    {
                            return bittable2[i] | bittable1[i];
                    } else {
                            return (ret | bittable1[i&0xff]);
                    }
            } else {
                    if ( bittable1[(i>>8)&0xff] )
                    {
                            return (ret | (bittable1[(i>>8)&0xff]<<8));
                    } else {
                            return (ret | bittable1[i&0xff]);
                    }
            }
    } else {
            if ( bittable1[(i>>16)&0xff] )
            {
                    return (ret | (bittable1[(i>>16)&0xff]<<16));
            } else if ( bittable1[(i>>8)&0xff] ) {
                    return (ret | (bittable1[(i>>8)&0xff]<<8));
            } else {
                    return (ret | (bittable1[i&0xff]));
            }
    }
}

void gentable()
{
    int i;
    for ( i=0; i<256; i++ )
    {
            int bitset = 0;
            int j;
            for ( j=128; j; j>>=1 )
            {
                    if ( i&j )
                    {
                            bitset++;
                            if ( bitset == 1 ) bittable1[i] = i&(~(j-1));
                            else if ( bitset == 2 ) bittable2[i] = i&(~(j-1));
                    }
            }
            //printf("%3d %02x %02x\n",i,bittable1[i],bittable2[i]);
    }
}
mvds
there is one assumption: there must be 2 bits (as stated in the requirements) it will fail on situations with only 1 bit set in the last byte. -- fixed
mvds
An interesting note on optimization: I thought having the lookup tables `unsigned int` and pre-shifted `<< 24` would make things faster (taking one `<<24` out of the loop) but it makes no difference at all...
mvds
A: 

Here's my implementation in C#

uint OnlyMostSignificant(uint value, int count) {
    uint newValue = 0;
    int c = 0;

    for(uint high = 0x80000000; high != 0 && c < count; high >>= 1) {
        if ((value & high) != 0) {
            newValue = newValue | high;
            c++;
        }
    }

    return newValue;
}

Using count, you could make it the most significant (count) bits.

Vincent McNabb
+7  A: 

If the input is guaranteed to have exactly 2 or 3 bits then the answer can be computed very quickly. We exploit the fact that the expression x&(x-1) is equal to x with the LSB cleared. Applying that expression twice to the input will produce 0, if 2 or fewer bits are set. If exactly 2 bits are set, we return the original input. Otherwise, we return the original input with the LSB cleared.

Here is the code in C++:

// assumes a has exactly 2 or 3 bits set
int topTwoBitsOf( int a ) 
{
   int b = a&(a-1);         // b = a with LSB cleared
   return b&(b-1) ? b : a;  // check if clearing the LSB of b produces 0
}

This can be written as a confusing single expression, if you like:

int topTwoBitsOf( int a )
{
   return a&(a-1)&((a&(a-1))-1) ? a&(a-1) : a;
}
Tom Sirgedas
Your bit-fu is strong
Jherico
Potent powers indeed. +1
spender
You're saying "bytes", but you mean "bits".
Nick Johnson
Oops, fixed! Thanks, Nick!
Tom Sirgedas
A: 

My solution:

Use "The best method for counting bits in a 32-bit integer", then clear the lower bit if the answer is 3. Only works when input is limited to 2 or 3 bits set.

unsigned int c; // c is the total bits set in v
unsigned int v = value;
v = v - ((v >> 1) & 0x55555555);                    
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);     // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count

crc+=value&value-(c-2);
AShelly