views:

74

answers:

5

I want to store bits in an array (like structure). So I can follow either of the following two approaches

Approach number 1 (AN 1)

struct BIT
{
   int data : 1
};

int main()
{
   BIT a[100];
   return 0;
}

Approach number 2 (AN 2)

int main()
{
    std::bitset<100> BITS;
    return 0;
}

Why would someone prefer AN 2 over AN 1?

A: 

Approach number 1 will most likely be compiled as an array of 4-byte integers, and one bit of each will be used to store your data. Theoretically a smart compiler could optimize this, but I wouldn't count on it.

Is there a reason you don't want to use std::bitset?

Jonathan
A: 

bitset has more operations

Wernight
It takes less space.
larsmans
I don't think its wrong answer. +1 to neutralize.
bjskishore123
+3  A: 

Because approach nr. 2 actually uses 100 bits of storage, plus some very minor (constant) overhead, while nr. 1 typically uses four bytes of storage per Bit structure. In general, a struct is at least one byte large per the C++ standard.

#include <bitset>
#include <iostream>

struct Bit { int data : 1; };

int main()
{
    Bit a[100];
    std::bitset<100> b;
    std::cout << sizeof(a) << "\n";
    std::cout << sizeof(b) << "\n";
}

prints

400
16

Apart from this, bitset wraps your bit array in a nice object representation with many useful operations.

larsmans
How come the output of `sizeof(b)` is 16? 100 bits means 100/8 bytes. What am I missing?
CLOWN
It has to be "rounded" to a multiple of eight, since a byte is the smallest unit of storage that is directly addressable. Try compiling with 128 bits in the set, then it's still 16. `bitset` uses some tricks to pretend it is actually addressing bits..
larsmans
My previous comment was half right. It's rounded up to a multiple of 4 bytes, because a four-byte unit is the fastest to process for a 32-bit computer. But in any case, a byte is the smallest addressable unit, so it's at least 13 (100/8 = 12.5, and half a byte is not addressable). Sorry, friday afternoon.
larsmans
+1  A: 

To quote cplusplus.com's page on bitset, "The class is very similar to a regular array, but optimizing for space allocation". If your ints are 4 bytes, a bitset uses 32 times less space.

Even doing bool bits[100], as sbi suggested, is still worse than bitset, because most implementations have >= 1-byte bools.

If, for reasons of intellectual curiosity only, you wanted to implement your own bitset, you could do so using bit masks:

typedef struct {
  unsigned char bytes[100];
} MyBitset;

bool getBit(MyBitset *bitset, int index)
{
  int whichByte = index / 8; 
  return bitset->bytes[whichByte] && (1 << (index = % 8));
}

bool setBit(MyBitset *bitset, int index, bool newVal)
{
  int whichByte = index / 8;

  if (newVal)
  {
    bitset->bytes[whichByte] |= (1 << (index = % 8));
  }
  else
  {
    bitset->bytes[whichByte] &= ~(1 << (index = % 8));
  }
}

(Sorry for using a struct instead of a class by the way. I'm thinking in straight C because I'm in the middle of a low-level assignment for school. Obviously two huge benefits of using a class are operator overloading and the ability to have a variable-sized array.)

Jon Rodriguez
+2  A: 

A good choice depends on how you're going to use the bits.

std::bitset<N> is of fixed size. Visual C++ 10.0 is non-conforming wrt. to constructors; in general you have to provide a workaround. This was, ironically, due to what Microsoft thought was a bug-fix -- they introduced a constructor taking int argument, as I recall.

std::vector<bool> is optimized in much the same way as std::bitset. Cost: indexing doesn't directly provide a reference (there are no references to individual bits in C++), but instead returns a proxy object -- which isn't something you notice until you try to use it as a reference. Advantage: minimal storage, and the vector can be resized as required.

Simply using e.g. unsigned is also an option, if you're going to deal with a small number of bits (in practice, 32 or less, although the formal guarantee is just 16 bits).

Finally, ALL UPPERCASE identifiers are by convention (except Microsoft) reserved for macros, in order to reduce the probability of name collisions. It's therefore a good idea to not use ALL UPPERCASE identifiers for anything else than macros. And to always use ALL UPPERCASE identifiers for macros (this also makes it easier to recognize them).

Cheers & hth.,

Alf P. Steinbach