views:

709

answers:

13

I am running through a memory block of binary data byte-wise.

Currently I am doing something like this:

for (i = 0; i < data->Count; i++)
{   
    byte = &data->Data[i];
    ((*byte & Masks[0]) == Masks[0]) ? Stats.FreqOf1++; // syntax incorrect but you get the point.
    ((*byte & Masks[1]) == Masks[1]) ? Stats.FreqOf1++;
    ((*byte & Masks[2]) == Masks[2]) ? Stats.FreqOf1++;
    ((*byte & Masks[3]) == Masks[3]) ? Stats.FreqOf1++;
    ((*byte & Masks[4]) == Masks[4]) ? Stats.FreqOf1++;
    ((*byte & Masks[5]) == Masks[5]) ? Stats.FreqOf1++;
    ((*byte & Masks[6]) == Masks[6]) ? Stats.FreqOf1++;
    ((*byte & Masks[7]) == Masks[7]) ? Stats.FreqOf1++;
}

Where Masks is:

for (i = 0; i < 8; i++)
{
 Masks[i] = 1 << i;
}

(I somehow did not manage to do it as fast in a loop or in an inlined function, so I wrote it out.)

Does anyone have any suggestions on how to to improve this first loop? I am rather inexperienced with getting down to bits.

This may seem like a stupid thing to do. But I am in the process of implementing a compression algorithm. I just want to have the bit accessing part down right.

Thanks!

PS: This is in on the Visual Studio 2008 compiler. So it would be nice if the suggestions applied to that compiler.

PPS: I just realized, that I don't need to increment two counts. One would be enough. Then compute the difference to the total bits at the end. But that would be specific to just counting. What I really want done fast is the bit extraction.

EDIT: The lookup table idea that was brought forward is nice. I realize though that I posed the question wrong in the title. Because in the end what I want to do is not count the bits, but access each bit as fast as possible.

ANOTHER EDIT: Is it possible to advance a pointer by just one bit in the data?

ANOTHER EDIT: Thank you for all your answers so far.

What I want to implement in the next steps is a nonsophisticated binary arithmetic coder that does not analyze the context. So I am only interested in single bits for now. Eventually it will become a Context-adaptive BAC but I will leave that for later.

Processing 4 bytes instead of 1 byte could be an option. But a loop over 32 bits is costly as well, isn't it?

+12  A: 

See the following link for a dozen of bit related stuff: Bit Twiddling Hacks

arul
Good recommendation! I think Kernighan's method is the easiest to remember.
Anthony Cuozzo
+16  A: 

The fastest way is probably to build a lookup table of byte values versus the number of bits set in that byte. At least that was the answer when I interviewed at Google.

Paul Tomblin
If you're looking at the data byte-by-byte, which you are, then with only 256 possible values, a look-up table is almost definitely going to be your best bet.
dancavallaro
Thanks! Please see my edits for an answer to this idea.
Even if you're not looking byte by byte, you can do a mask and compare with all the bytes in your data type, and add up all the results.
Paul Tomblin
You'll be incinerated for providing an answer to a Google interview question! :P
Vinko Vrsalovic
A: 

Not tested. Just an idea.

For each data byte;

int f[2] = {0};
for(int i = 0 ; i < 8 ; i++){
    f[ (*byte >> i) & 0x1 ]++;
}
m3rLinEz
+2  A: 

You could use a precomputed lookup table, i.e:

static int bitcount_lookup[256] = { ..... } ; /* or make it a global and compute the values in code */

...

for( ... ) 
   byte = ... 
   Stats.FreqOf1 += bitcount_lookup[byte];
frankodwyer
+5  A: 

Use a table that maps each byte value (256) to the number of 1's in it. (The # of 0's is just (8 - # of 1's)). Then iterate over the bytes and perform a single lookup for each byte, instead of multiple lookups and comparisons. For example:

int onesCount = 0;
for (i = 0; i < data->Count; i++)
{   
    byte = &data->Data[i];
    onesCount += NumOnes[byte];
}
Stats.FreqOf1 += onesCount;
Stats.FreqOf0 += (data->Count * 8) - onesCount;
Dave L.
Thanks! Please see my edits for an answer to this idea.
A: 

To join the link wagon: counting bits

Martijn Laarman
A: 

If this is not a case of premature optimization and you truly need to squeeze out every last femtosecond, then you're probably better off with a 256-element static array that you populate once with the bit-count of each byte value, then

Stats.FreqOf1 += bitCountTable[byte]

and when the loop is done:

Stats.FreqOf0 = ((data->Count * 8) - Stats.FreqOf1)

Brian Deacon
I am somewhat sure that it could count as premature optimization (see my edits) as the task is not even the same. I just want the bit accessing part down right.
+1  A: 

Here is a method how to count the 1 bits of a 32bit integer (based on Java's Integer.bitCount(i) method):

unsigned bitCount(unsigned i) {
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    i = (i + (i >> 4)) & 0x0f0f0f0f;
    i = i + (i >> 8);
    i = i + (i >> 16);
    return i & 0x3f;
}

So you can cast your data to int and move forward in 4 byte steps.

martinus
A: 

There's a whole chapter on the different techniques for this in the book Beautiful Code. You can read (most of) it on Google books starting here.

Dan Dyer
+1  A: 

Here is a simple one I whipped up on just a single 32 bit value, but you can see it wouldn't be hard to adapt it to any number of bits....

int ones = 0;
int x = 0xdeadbeef;
for(int y = 0;y < 32;y++)
{
 if((x & 0x1) == 0x1) ones++;
 x = (x >> 1);
}

printf("%x contains %d ones and %d zeros.\n", x, ones, 32-ones);

Notice however, that it modifies the value in the process. If you are doing this on data you need to keep, then you need to make a copy of it first.

Doing this in __asm would probably be a better, maybe faster way, but it's hard to say with how well the compiler can optimize...

With each solution you consider, each one will have drawbacks. A lookup table or a bit shifter (like mine), both have drawbacks.

Larry

LarryF
+1  A: 

ttobiass - Keep in mind your inline functions are important in applications like you are talking about, but there are things you need to keep in mind. You CAN get the performance out of the inline code, just remember a couple things.

  • inline in debug mode does not exist. (Unless you force it)
  • the compiler will inline functions as it sees fit. Often, if you tell it to inline a function, it may not do it at all. Even if you use __forceinline. Check MSDN for more info on inlining.
  • Only certain functions can even be inlined. For example, you cannot inline a recursive function.

You'll get your best performance out of your project settings for the C/C++ language, and how you construct your code. At this point, it's important to understand Heap vs. Stack operations, calling conventions, memory alignment, etc.

I know this does not answer your question exactly, but you mention performance, and how to get the best performance, and these things are key.

LarryF
Thank you for your helpful comments.I noticed the debug behaviour as well so I switched to release mode.I realize that inline functions (even if inlined by the compiler) do not solve all problems or rather come at a some costs.
+1  A: 

I did not really understand what you're trying to do. But if you just want to get access to the bits of a bitmap, you can use these (untested!!!) functions:

#include <stddef.h>

_Bool isbitset(unsigned char * bitmap, size_t idx)
{
    return bitmap[idx / 8] & (1 << (idx % 8)) ? 1 : 0;
}

void setbit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] |= (1 << (idx % 8));
}

void unsetbit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] &= ~(1 << (idx % 8));
}

void togglebit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] ^= (1 << (idx % 8));
}

Edit: Ok, I think I understand what you want to do: Fast iteration over a sequence of bits. Therefore, we don't want to use the random access functions from above, but read a whole word of data at once.

You might use any unsigned integer type you like, but you should choose one which is likely to correspond to the word size of your architecture. I'll go with uint_fast32_t from stdint.h:

uint_fast32_t * data = __data_source__;
for(; __condition__; ++data)
{
    uint_fast32_t mask = 1;
    uint_fast32_t current = *data;
    for(; mask; mask <<= 1)
    {
        if(current & mask)
        {
            // bit is set
        }
        else
        {
            // bit is not set
        }
    }
}

From the inner loop, you can set the bit with

*data |= mask;

unset the bit with

*data &= ~mask;

and toggle the bit with

*data ^= mask;

Warning: The code might behave unexpectedly on big-endian architectures!

Christoph
A: 

A faster way to extract bits is to use:

bitmask= data->Data[i];

while (bitmask)
{
    bit_set_as_power_of_two= bitmask & -bitmask;
    bitmask&= bitmask - 1;
}

If you just want to count bits set, a LUT in cache per would be fast, but you can also do it in constant time with the interleaved bit counting method in the link in this answer.

MSN