Do something like this
std::vector<bool> a;
a.push_back(true);
a.push_back(false);
//...
for (auto it = a.begin(); it != a.end();) // see 0x for meaning of auto
{
unsigned b = 0;
for (int i = 0; i < 8*sizeof(b); ++i)
{
b |= (*it & 1) << (8*sizeof(b) - 1 - i);
++it;
}
// flush 'b'
}
So, what you end up doing is that you group chunks of bits together, here I've chosen to group bits into native integers (which is optimal for the target platform). I don't check the indexes here but that's something you'll have to do. What I would do is that I would check how many full chunks I could extract first, do that and then handle any remainder.
Also, note that I'm filling in bits from left to right (assuming the target architecture is little-endian) this means filling in the msb first.
If your doing bit manipulation and stuff like that, figure out a packing scheme for you bits and let that be your data structure. std::bit_vector, std::vector or ::dequeue doesn't really matter. Pack your bits cleverly into the target platform's native integer type, that will give the best kind of performance.