views:

704

answers:

6

When I want an array of flags it has typically pained me to use an entire byte (or word) to store each one, as would be the result if I made an array of bools or some other numeric type that could be set to 0 or 1. But now I wonder whether using a structure that is more space-efficient is worth it given the (albeit hopefully very slight) additional overhead of shifting and bit testing.

In my company we use Rogue Wave tools (though hopefully not for much longer) and it's their RWBitVec that I've used for this purpose up until now.

+2  A: 

It's mostly about saving memory. If your array of bools is large enough that a 8x improvement on storage space is meaningful, then by all means, use a bitarray.

Note that the memory access is pretty expensive compared to the shift/and, so the bitarray approach is slightly faster than the array-of-chars. Basically it comes down to memory versus programmer time. Remember that premature optimization is a waste of time. I'd use whichever approach is the easiest to develop, and then refactor only after it shows that it's a primary performance bottleneck.

bmdhacks
Because memory access is so fast, you are likely to see a significant speed-up with a bit array since the chance of a cache miss decreases 8 fold.
Rob Walker
From personal experience I've gained an order of magnitude by using a BitArray in C++.
Robert Gould
Some quick tests confirm that. Editing to reflect that.
bmdhacks
The memory versus programmer time is a VERY GOOD point. A bit array adds a non trivial layer on top of what could otherwise be a trivial task, and that means more coding and more maintenance
Robert Gould
A: 

Is it worth it? Only if you know that you have a problem with memory usage.

But unless you're either:

  1. Working on an embedded processor with very limited resources, or
  2. Storing an astronomical number of bools

then the answer is no. You'll have to work somewhat harder to achieve the same level of readability in your source by using a bitmap than you will using bools, and unless you're operating under either of the previous two conditions you'll likely find that it doesn't make any noticeable difference to your memory footprint.

Andrew Edgecombe
+3  A: 

Don't use vector<bool>, it's not really a Container:

http://www.informit.com/guides/content.aspx?g=cplusplus&amp;seqNum=98

Use std::bitset (for fixed size bitsets) and boost::dynamic_bitset (for resizeable ones) where appropriate. They aren't Containers either, but they don't look as if they ought to be, so are less likely to cause confusion.

Whether the trade-off is worth it depends, obviously, on how big the arrays are in your program. I think you're right that the overhead of bit access is usually negligible, but if the memory overhead is negligible too then you've nothing to go on there either.

bitsets have the advantage that they do exactly what they say on the tin - none of this "declare an array of chars/ints, but the only legal values are 0 and 1" nonsense. Your code will read about the same as if you'd used an array.

Steve Jessop
+1  A: 

I wrote some code once to unpack a bitmap image line into separate bytes per pixel, then pack it back again after processing. For the code I was benchmarking, it was actually faster to do it that way than to work at the bit level.

Mark Ransom
+1  A: 

Modern computers have barrel shifters so that a shift of any number of bits up to 31 takes a few cycles (less than many other instructions). Compilers take advantage of this and bit operations are not only space efficient but in most cases time efficient.

But it really depends on how you're using and testing the bits - there are some inefficient methods that would make using a whole integer faster.

Adam Davis
+2  A: 

I've used a bit array for indexing a HUGE tree. The algorithm was:

    Check bitarray if entry exists
    if entry doesn't exists 
      return null
    else do binary search in tree
      return value

The advantage is that the Tree has huge enough that searching for a non existent entry would cause several cache misses before completing. Thus the algorithm was taking longer or not depending on the existence of the value.

However adding that initial bit array search meant I'd reduce cache misses, and would avoid searching the tree at all if the answer wasn't there. By adding this extra step the algorithm became much more robust (actual performance time on a Computer, became nearly linear although the Big-O would say differently), and overall performance increased by an order of magnitude.

Like they say sometimes taking hardware into consideration is more important than the "ideal" mathematical algorithm.

Robert Gould
Holy shoulda-used-a-Bloom-filter, Batman!
Rich
I guess it was Kind of like a bloom filter, however I needed to remove entries and add new ones all the time (very dynamic system). Meaning I'd had to make a complicated Bloom filter, and keep renewing it all the time. But in the end the algorithm is similar to a Bloom filter
Robert Gould