views:

303

answers:

8

I have areas of memory that could be considered "array of bits". They are equivalent to

unsigned char arr[256];

But it would be better thought of as

bit arr[2048];

I'm accessing separate bits from it with

#define GETBIT(x,in)   ((in)[ ((x)/8) ] & 1<<(7-((x)%8)))

but I do it a lot in many places of the code, often in performance-critical sections and I wonder if there are any smarter, more optimal methods to do it.

extra info: Architecture: ARM9 (32 bit); gcc/Linux. The physical data representation can't be changed - it is externally provided or mapped for external use.

+1  A: 

If You reverse the bit order in 'arr', then You can eliminate the substraction from the macro. It is the best what i can say, without knowledge of the problem context (how the bits are used).

sambowry
A: 

Why not create your own wrapper class?

You could then add bits to the "array" using an operator such as + and get back the individual bits using the [] operator.

Your macro could be improved by using & 7 instead of % 8 but its likely the compiler will make that optimisation for you anyway.

I recently did exactly what you are doing and my stream could consist of any number of bits.

So I have something like the following:

BitStream< 1 > oneBitBitStream;
BitStream< 2 > twoBitBitStream;

oneBitBitStream += Bit_One;
oneBitBitStream += Bit_Zero;

twoBitBitStream += Bit_Three;
twoBitBitStream += Bit_One;

and so on. It makes for nice readable code and you can provide an STL like interface to it for aiding faimilarity :)

Goz
+4  A: 

I don't think so. In fact, many CPU architectures won't access bits individually.

On C++ you have std::bitset<N>. but may not have highest-performance depending on your compiler's implementation and optimization.

BTW, it may be better to group your bit-array as uint32_t[32] (or uint64_t[16]) for aligned dereferencing (which bitset does this for you already).

KennyTM
WHy do you think that `std::bitset<N>` is particularly slow? As you correctly note, on many CPUs accessing individual bits is just not fast, but I see no reason why std::bitset<> adds overhead. In fact, it can be faster than a `char[]` because it can eliminate aliasing problems.
MSalters
+1 to MSalters comment - std::bitset - with optimizations enabled - should not be any different from the macro, only less chances for mistakes. STL-style code can be notably slower in debug builds, though - if you do notice microsecond variations.
peterchen
+1 this is the correct answer, though I don't know why poster believes `bitset` would be slower than rolling out your own; `bitset` is highly optimized on most systems
BlueRaja - Danny Pflughoeft
+3  A: 

For randomly accessing individual bits, the macro you've suggested is as good as you're going to get (as long as you turn on optimisations in your compiler).

If there is any pattern at all to the bits you're accessing, then you may be able to do better. For example, if you often access pairs of bits, then you may see some improvement by providing a method to get two bits instead of one, even if you don't always end up using both bits.

As with any optimisation problem, you will need to be very familiar with the behaviour of your code, in particular its access patterns in your bit array, to make a meaningful improvement in performance.

Update: Since you access ranges of bits, you can probably squeeze some more performance out of your macros. For example, if you need to access four bits you might have macros like this:

#define GETBITS_0_4(x,in) (((in)[(x)/8] & 0x0f))
#define GETBITS_1_4(x,in) (((in)[(x)/8] & 0x1e) >> 1)
#define GETBITS_2_4(x,in) (((in)[(x)/8] & 0x3c) >> 2)
#define GETBITS_3_4(x,in) (((in)[(x)/8] & 0x78) >> 3)
#define GETBITS_4_4(x,in) (((in)[(x)/8] & 0xf0) >> 4)
#define GETBITS_5_4(x,in) ((((in)[(x)/8] & 0xe0) >> 5) | (((in)[(x)/8+1] & 0x01)) << 3)
#define GETBITS_6_4(x,in) ((((in)[(x)/8] & 0xc0) >> 6) | (((in)[(x)/8+1] & 0x03)) << 2)
#define GETBITS_7_4(x,in) ((((in)[(x)/8] & 0x80) >> 7) | (((in)[(x)/8+1] & 0x07)) << 1)
// ...etc

These macros would clip out four bits from each bit position 0, 1, 2, etc. (To cut down on the proliferation of pointless parentheses, you might want to use inline functions for the above.) Then perhaps define an inline function like:

inline int GETBITS_4(int x, unsigned char *in) {
    switch (x % 8) {
        case 0: return GETBITS_0_4(x,in);
        case 1: return GETBITS_1_4(x,in);
        case 2: return GETBITS_2_4(x,in);
        // ...etc
    }
}

Since this is a lot of tedious boilerplate code, especially if you've got multiple different widths, you may want to write a program to generate all the GETBIT_* accessor functions.

(I notice that the bits in your bytes are stored in the reverse order from what I've written above. Apply an appropriate transformation to match your structure if you need to.)

Greg Hewgill
I often access -ranges- of bits, starting with random non-word-aligned *m* and ending with another random *n*.
SF.
Excellent, that's a great opportunity. I'll add more to my answer.
Greg Hewgill
At this point you'd want templates to be honest. See my answer.
MSalters
@MSalters: good idea.
Greg Hewgill
While KennyTM and MSalters provided some pretty good answers that would be the "right thing to do" if I was rolling all of the system by myself, I have to work with the data arriving from external sources or being prepared in specific "raw" format to be sent away, and `std::bitset<N>` not only would require custom allocators to be placed over these areas, nothing except efficiency is guaranteed as for the internal layout of the bits. In my case the `bit array[n]` memory layout of data can't be avoided.
SF.
A: 

Since the question is tagged with C++, is there any reason you can't simply use the standard bitset?

Max Shawabkeh
only if it's possible to allocate it over a predefined memory area. A significant one is a library-generated bitmap for a screen.Also -is- it better performance-wise? (embedded device, limited resources, short on performance already)
SF.
@SF: You might be able to use std::bitset with a custom allocator that uses your memory mapped bit arrays.
Drew Hall
+1  A: 
#define GETBIT(x,in)   ((in)[ ((x)/8) ] & 1<<(7-((x)%8)))

can be optimized.

1) Use standard int which is normally the fastest accessible integer datatype. If you don't need to be portable, you can find out the size of an int with sizeof and adapt the following code.

2)

#define GETBIT(x,in)   ((in)[ ((x) >>> 3) ] & 1<<((x) & 7))

The mod operator % is slower than ANDing. And you don't need to subtract, simply adjust your SETBIT routine.

Thorsten S.
iain
Never relay on optimisations the compiler may make or not. If you can optimise it, do it, else leave it for the compiler who might or might not be able to optimise it.
George B.
+2  A: 

Taking Greg's solution as a basis:

template<unsigned int n, unsigned int m> 
inline unsigned long getbits(unsigned long[] bits) {
  const unsigned bitsPerLong = sizeof(unsigned long) * CHAR_BIT
  const unsigned int bitsToGet = m - n;
  BOOST_STATIC_ASSERT(bitsToGet < bitsPerLong);
  const unsigned mask = (1UL << bitsToGet) - 1;
  const size_t index0 = n / bitsPerLong;
  const size_t index1 = m / bitsPerLong;
  // Do the bits to extract straddle a boundary?
  if (index0 == index1) {
    return (bits[index0] >> (n % bitsPerLong)) & mask;
  } else {
    return ((bits[index0] >> (n % bitsPerLong)) + (bits[index1] << (bitsPerLong - (m % bitsPerLong)))) & mask;
  }
}

Can get at least 32 bits, even if they are not aligned. Note that's intentionally inline as you don't want to have tons of these functions.

MSalters
A: 

Instead of the unsigned char array and custom macros, you can use std::vector<bool>. The vector class template has a special template specialization for the bool type. This specialization is provided to optimize for space allocation: In this template specialization, each element occupies only one bit (which is eight times less than the smallest type in C++: char).

Vijay Mathew
`vector<bool>` seems to be "deprecated", though, anywhere you look (precisely because it only pretends to be a vector), and seems to be superseded by `dynamic_bitset` (if the size is determined at compile-time, just `std::bitset` might be more appropriate).
visitor