ansaurus

Question

What is the fastest way(s) to loop through a large data chunk on a per-bit basis.

Answer 1

+12 A:

See the following link for a dozen of bit related stuff: Bit Twiddling Hacks

arul 2009-01-06 21:36:41

Good recommendation! I think Kernighan's method is the easiest to remember.

Anthony Cuozzo 2009-01-06 22:05:56

Answer 2

+16 A:

The fastest way is probably to build a lookup table of byte values versus the number of bits set in that byte. At least that was the answer when I interviewed at Google.

Paul Tomblin 2009-01-06 21:38:34

If you're looking at the data byte-by-byte, which you are, then with only 256 possible values, a look-up table is almost definitely going to be your best bet.

dancavallaro 2009-01-06 21:48:15

Thanks! Please see my edits for an answer to this idea.

2009-01-06 21:55:28

Even if you're not looking byte by byte, you can do a mask and compare with all the bytes in your data type, and add up all the results.

Paul Tomblin 2009-01-06 22:07:26

You'll be incinerated for providing an answer to a Google interview question! :P

Vinko Vrsalovic 2009-01-06 22:34:57

Answer 3

A:

Not tested. Just an idea.

For each data byte;

int f[2] = {0};
for(int i = 0 ; i < 8 ; i++){
    f[ (*byte >> i) & 0x1 ]++;
}

m3rLinEz 2009-01-06 21:41:02

Answer 4

+2 A:

You could use a precomputed lookup table, i.e:

static int bitcount_lookup[256] = { ..... } ; /* or make it a global and compute the values in code */

...

for( ... ) 
   byte = ... 
   Stats.FreqOf1 += bitcount_lookup[byte];

frankodwyer 2009-01-06 21:41:03

Answer 5

+5 A:

Use a table that maps each byte value (256) to the number of 1's in it. (The # of 0's is just (8 - # of 1's)). Then iterate over the bytes and perform a single lookup for each byte, instead of multiple lookups and comparisons. For example:

int onesCount = 0;
for (i = 0; i < data->Count; i++)
{   
    byte = &data->Data[i];
    onesCount += NumOnes[byte];
}
Stats.FreqOf1 += onesCount;
Stats.FreqOf0 += (data->Count * 8) - onesCount;

Dave L. 2009-01-06 21:42:13

Thanks! Please see my edits for an answer to this idea.

2009-01-06 21:55:58

Answer 6

A:

To join the link wagon: counting bits

Martijn Laarman 2009-01-06 21:43:17

Answer 7

A:

If this is not a case of premature optimization and you truly need to squeeze out every last femtosecond, then you're probably better off with a 256-element static array that you populate once with the bit-count of each byte value, then

Stats.FreqOf1 += bitCountTable[byte]

and when the loop is done:

Stats.FreqOf0 = ((data->Count * 8) - Stats.FreqOf1)

Brian Deacon 2009-01-06 21:47:09

I am somewhat sure that it could count as premature optimization (see my edits) as the task is not even the same. I just want the bit accessing part down right.

2009-01-06 21:58:51

Answer 8

+1 A:

Here is a method how to count the 1 bits of a 32bit integer (based on Java's Integer.bitCount(i) method):

unsigned bitCount(unsigned i) {
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    i = (i + (i >> 4)) & 0x0f0f0f0f;
    i = i + (i >> 8);
    i = i + (i >> 16);
    return i & 0x3f;
}

So you can cast your data to int and move forward in 4 byte steps.

martinus 2009-01-06 21:49:58

Answer 9

A:

There's a whole chapter on the different techniques for this in the book Beautiful Code. You can read (most of) it on Google books starting here.

Dan Dyer 2009-01-06 21:54:41

Answer 10

+1 A:

Here is a simple one I whipped up on just a single 32 bit value, but you can see it wouldn't be hard to adapt it to any number of bits....

int ones = 0;
int x = 0xdeadbeef;
for(int y = 0;y < 32;y++)
{
 if((x & 0x1) == 0x1) ones++;
 x = (x >> 1);
}

printf("%x contains %d ones and %d zeros.\n", x, ones, 32-ones);

Notice however, that it modifies the value in the process. If you are doing this on data you need to keep, then you need to make a copy of it first.

Doing this in __asm would probably be a better, maybe faster way, but it's hard to say with how well the compiler can optimize...

With each solution you consider, each one will have drawbacks. A lookup table or a bit shifter (like mine), both have drawbacks.

Larry

LarryF 2009-01-06 22:04:14

Answer 11

+1 A:

ttobiass - Keep in mind your inline functions are important in applications like you are talking about, but there are things you need to keep in mind. You CAN get the performance out of the inline code, just remember a couple things.

inline in debug mode does not exist. (Unless you force it)
the compiler will inline functions as it sees fit. Often, if you tell it to inline a function, it may not do it at all. Even if you use __forceinline. Check MSDN for more info on inlining.
Only certain functions can even be inlined. For example, you cannot inline a recursive function.

You'll get your best performance out of your project settings for the C/C++ language, and how you construct your code. At this point, it's important to understand Heap vs. Stack operations, calling conventions, memory alignment, etc.

I know this does not answer your question exactly, but you mention performance, and how to get the best performance, and these things are key.

LarryF 2009-01-06 22:27:11

Thank you for your helpful comments.I noticed the debug behaviour as well so I switched to release mode.I realize that inline functions (even if inlined by the compiler) do not solve all problems or rather come at a some costs.

2009-01-06 22:35:19

Answer 12

+1 A:

I did not really understand what you're trying to do. But if you just want to get access to the bits of a bitmap, you can use these (untested!!!) functions:

#include <stddef.h>

_Bool isbitset(unsigned char * bitmap, size_t idx)
{
    return bitmap[idx / 8] & (1 << (idx % 8)) ? 1 : 0;
}

void setbit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] |= (1 << (idx % 8));
}

void unsetbit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] &= ~(1 << (idx % 8));
}

void togglebit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] ^= (1 << (idx % 8));
}

Edit: Ok, I think I understand what you want to do: Fast iteration over a sequence of bits. Therefore, we don't want to use the random access functions from above, but read a whole word of data at once.

You might use any unsigned integer type you like, but you should choose one which is likely to correspond to the word size of your architecture. I'll go with uint_fast32_t from stdint.h:

uint_fast32_t * data = __data_source__;
for(; __condition__; ++data)
{
    uint_fast32_t mask = 1;
    uint_fast32_t current = *data;
    for(; mask; mask <<= 1)
    {
        if(current & mask)
        {
            // bit is set
        }
        else
        {
            // bit is not set
        }
    }
}

From the inner loop, you can set the bit with

*data |= mask;

unset the bit with

*data &= ~mask;

and toggle the bit with

*data ^= mask;

Warning: The code might behave unexpectedly on big-endian architectures!

Christoph 2009-01-06 23:00:08

Answer 13

A:

A faster way to extract bits is to use:

bitmask= data->Data[i];

while (bitmask)
{
    bit_set_as_power_of_two= bitmask & -bitmask;
    bitmask&= bitmask - 1;
}

If you just want to count bits set, a LUT in cache per would be fast, but you can also do it in constant time with the interleaved bit counting method in the link in this answer.

MSN 2009-02-27 21:32:46

ansaurus

tags:

views:

answers:

What is the fastest way(s) to loop through a large data chunk on a per-bit basis.

related questions