I am running through a memory block of binary data byte-wise.
Currently I am doing something like this:
for (i = 0; i < data->Count; i++)
{
byte = &data->Data[i];
((*byte & Masks[0]) == Masks[0]) ? Stats.FreqOf1++; // syntax incorrect but you get the point.
((*byte & Masks[1]) == Masks[1]) ? Stats.FreqOf1++;
((*byte & Masks[2]) == Masks[2]) ? Stats.FreqOf1++;
((*byte & Masks[3]) == Masks[3]) ? Stats.FreqOf1++;
((*byte & Masks[4]) == Masks[4]) ? Stats.FreqOf1++;
((*byte & Masks[5]) == Masks[5]) ? Stats.FreqOf1++;
((*byte & Masks[6]) == Masks[6]) ? Stats.FreqOf1++;
((*byte & Masks[7]) == Masks[7]) ? Stats.FreqOf1++;
}
Where Masks is:
for (i = 0; i < 8; i++)
{
Masks[i] = 1 << i;
}
(I somehow did not manage to do it as fast in a loop or in an inlined function, so I wrote it out.)
Does anyone have any suggestions on how to to improve this first loop? I am rather inexperienced with getting down to bits.
This may seem like a stupid thing to do. But I am in the process of implementing a compression algorithm. I just want to have the bit accessing part down right.
Thanks!
PS: This is in on the Visual Studio 2008 compiler. So it would be nice if the suggestions applied to that compiler.
PPS: I just realized, that I don't need to increment two counts. One would be enough. Then compute the difference to the total bits at the end. But that would be specific to just counting. What I really want done fast is the bit extraction.
EDIT: The lookup table idea that was brought forward is nice. I realize though that I posed the question wrong in the title. Because in the end what I want to do is not count the bits, but access each bit as fast as possible.
ANOTHER EDIT: Is it possible to advance a pointer by just one bit in the data?
ANOTHER EDIT: Thank you for all your answers so far.
What I want to implement in the next steps is a nonsophisticated binary arithmetic coder that does not analyze the context. So I am only interested in single bits for now. Eventually it will become a Context-adaptive BAC but I will leave that for later.
Processing 4 bytes instead of 1 byte could be an option. But a loop over 32 bits is costly as well, isn't it?