views:

269

answers:

8

I am looking for a faster algorithm than the below for the following. Given a sequence of 64-bit unsigned integers, return a count of the number of times each of the sixty-four bits is set in the sequence.

Example:

4608 = 0000000000000000000000000000000000000000000000000001001000000000 
4097 = 0000000000000000000000000000000000000000000000000001000000000001
2048 = 0000000000000000000000000000000000000000000000000000100000000000

counts 0000000000000000000000000000000000000000000000000002101000000001

Example:

2560 = 0000000000000000000000000000000000000000000000000000101000000000
530  = 0000000000000000000000000000000000000000000000000000001000010010
512  = 0000000000000000000000000000000000000000000000000000001000000000

counts 0000000000000000000000000000000000000000000000000000103000010010

Currently I am using a rather obvious and naive approach:

static int bits = sizeof(ulong) * 8;

public static int[] CommonBits(params ulong[] values) {
    int[] counts = new int[bits];
    int length = values.Length;

    for (int i = 0; i < length; i++) {
        ulong value = values[i];
        for (int j = 0; j < bits && value != 0; j++, value = value >> 1) {
            counts[j] += (int)(value & 1UL);
        }
    }

    return counts;
}
A: 

I'll direct you to the classical: Bit Twiddling Hacks, but your goal seems slightly different than just typical counting (i.e. your 'counts' variable is in a really weird format), but maybe it'll be useful.

Noon Silk
@silky: I've looked through Bit Twiddling Hacks already and didn't see anything of use. Thanks though. As far as the weirdness of the definition, I suppose one way to look at this is the classical bit counting algorithms count horizontally across a bit representation. Here I want to count vertically across several bit representations.
Jason
But it doesn't work; what if you have more than 9 bits set? Then you need to place '10' in that 'position' in your number there? I'm not sure what format you consider that number in (the "count" variable). I understand the desire to count horizontally though.
Noon Silk
(I mean vertically).
Noon Silk
@silky: Huh? I just passed `8191`, `8190`, `8189`, `8187` to my code (so that at least twelve bits are set in each number) and I received back the array `{3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}`.
Jason
I don't know your code, but for display purposes you are showing the 'count' (in the count variable) as a single digit. I am saying to you, if you have 10 numbers that each have the same bit set, you will get 10 (or a two-digit number) and your count variable will now look weird (10 indicating '1' bit and '0' bits set). It's just a display thing, possibly, because as I said I don't have your code.
Noon Silk
@silky: You do know my code. I provided it in the question. The displays in the examples are for convenience. I am not displaying to a user any of this information; it is merely used in internal calculations.
Jason
Jason: Your code above does not show the formatting you've used above; you can see my point I think, and I'll leave you to it.
Noon Silk
@silky: Ignore the formatting. The formatting is irrelevant to the problem that I am trying to solve.
Jason
A: 

http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive

One of them

unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
  v &= v - 1; // clear the least significant bit set
}

Keep in mind, that complexity of this method is aprox O(log2(n)) where n is the number to count bits in, so for 10 binary it need only 2 loops

You should probably take the metod for counting 32 bits whit 64 bit arithmetics and applying it on each half of word, what would take by 2*15 + 4 instructions

// option 3, for at most 32-bit values in v:
c =  ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) % 
   0x1f;
c += ((v >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;

If you have sse4,3 capable processor you can use POPCNT instruction. http://en.wikipedia.org/wiki/SSE4

ralu
This counts the bits set in one number. So, say, if I have the `ulong` `123` then this code will return 6. In my case, if the only input was the `ulong` `123` I would want to see the `int[64]` array `0000000000000000000000000000000000000000000000000000000001111011`. This tells me that the rightmost bit is set once in the input, the 2nd bit is set once, etc.
Jason
A: 

Ok let me try again :D

change each byte in 64 bit integer into 64 bit integer by shifting each bit by n*8 in lef

for instance

10110101 -> 0000000100000000000000010000000100000000000000010000000000000001 (use the lookup table for that translation)

Then just sum everything togeter in right way and you got array of unsigned chars whit integers.

You have to make 8*(number of 64bit integers) sumations

Code in c

//LOOKTABLE IS EXTERNAL and has is int64[256] ;
unsigned char* bitcounts(int64* int64array,int len)
{  
    int64* array64;
    int64 tmp;
    unsigned char* inputchararray;
    array64=(int64*)malloc(64);
    inputchararray=(unsigned char*)input64array;
    for(int i=0;i<8;i++) array64[i]=0; //set to 0

    for(int j=0;j<len;j++)
    {             
         tmp=int64array[j];
         for(int i=7;tmp;i--)
         {
             array64[i]+=LOOKUPTABLE[tmp&0xFF];
             tmp=tmp>>8;
         }
    }
    return (unsigned char*)array64;
}

This redcuce speed compared to naive implemetaton by factor 8, becuase it couts 8 bit at each time.

EDIT:

I fixed code to do faster break on smaller integers, but I am still unsure about endianess And this works only on up to 256 inputs, becuase it uses unsigned char to store data in. If you have longer input string, you can change this code to hold up to 2^16 bitcounts and decrease spped by 2

ralu
+1  A: 

A small speed improvement might be achieved by first OR'ing the integers together, then using the result to determine which bits you need to check. You would still have to iterate over each bit, but only once over bits where there are no 1s, rather than values.Length times.

Joel
@Joel: Good thought. I'll explore this idea, profile and report back. Thank you.
Jason
@Joel: While this didn't gain me much, it did lead to an idea that did. Basically, instead of shifting by one bit in each iteration of the loop, I shift by the number of bits needed to shift `value` until the LSB of `value` is one. I calculate this number quickly with some tricks from Bit Twiddling Hacks.
Jason
+2  A: 

This seems to be exactly what you want:

vertical counter

Robert L
How much of efford do you need to convert from vertical stored count to ordinary array?
ralu
This ended up not being faster than some minor modifications I made to the naive implementation I already provided.
Jason
A: 

The best I can do here is just get silly with it and unroll the inner-loop... seems to have cut the performance in half (roughly 4 seconds as opposed to the 8 in yours to process 100 ulongs 100,000 times)... I used a qick command-line app to generate the following code:

for (int i = 0; i < length; i++)
{
 ulong value = values[i];
 if (0ul != (value & 1ul)) counts[0]++;
 if (0ul != (value & 2ul)) counts[1]++;
 if (0ul != (value & 4ul)) counts[2]++;
 //etc...
 if (0ul != (value & 4611686018427387904ul)) counts[62]++;
 if (0ul != (value & 9223372036854775808ul)) counts[63]++;
}

that was the best I can do... As per my comment, you'll waste some amount (I know not how much) running this in a 32-bit environment. If your that concerned over performance it may benefit you to first convert the data to uint.

Tough problem... may even benefit you to marshal it into C++ but that entirely depends on your application. Sorry I couldn't be more help, maybe someone else will see something I missed.

Update, a few more profiler sessions showing a steady 36% improvement. shrug I tried.

csharptest.net
A: 
const unsigned int BYTESPERVALUE = 64 / 8;
unsigned int bcount[BYTESPERVALUE][256];
memset(bcount, 0, sizeof bcount);
for (int i = values.length; --i >= 0; )
  for (int j = BYTESPERVALUE ; --j >= 0; ) {
    const unsigned int jth_byte = (values[i] >> (j * 8)) & 0xff;
    bcount[j][jth_byte]++; // count byte value (0..255) instances
  }

unsigned int count[64];
memset(count, 0, sizeof count);
for (int i = BYTESPERVALUE; --i >= 0; )
  for (int j = 256; --j >= 0; ) // check each byte value instance
    for (int k = 8; --k >= 0; ) // for each bit in a given byte
      if (j & (1 << k)) // if bit was set, then add its count
        count[i * 8 + k] += bcount[i][j];
Jay
A: 

Another approach that might be profitable, would be to build an array of 256 elements, which encodes the actions that you need to take in incrementing the count array.

Here is a sample for a 4 element table, which does 2 bits instead of 8 bits.

int bitToSubscript[4][3] =
{
    {0},       // No Bits set
    {1,0},     // Bit 0 set
    {1,1},     // Bit 1 set
    {2,0,1}    // Bit 0 and bit 1 set.
}

The algorithm then degenerates to:

  • pick the 2 right hand bits off of the number.
  • Use that as a small integer to index into the bitToSubscriptArray.
  • In that array, pull off the first integer. That is the number of elements in the count array, that you need to increment.
  • Based on that count, Iterate through the remainder of the row, incrementing count, based on the subscript you pull out of the bitToSubscript array.
  • Once that loop is done, shift your original number two bits to the right.... Rinse Repeat as needed.

Now there is one issue I ignored, in that description. The actual subscripts are relative. You need to keep track of where you are in the count array. Every time you loop, you add two to an offset. To That offset, you add the relative subscript from the bitToSubscript array.

It should be possible to scale up to the size you want, based on this small example. I would think that another program could be used, to generate the source code for the bitToSubscript array, so that it can be simply hard coded in your program.

There are other variation on this scheme, but I would expect it to run faster on average than anything that does it one bit at a time.

Good Hunting.

Evil.

EvilTeach