views:

410

answers:

8

Is it even possible to create an array of bits with more than 100000000 elements? If it is, how would I go about doing this? I know that for a char array I can do this:

char* array;

array = (char*)malloc(100000000 * sizeof(char));

If I was to declare the array by char array[100000000] then I would get a segmentation fault, since the maximum number of elements has been exceeded, which is why I use malloc.

Is there something similar I can do for an array of bits?

+5  A: 

In C and C++, char is the smallest type. You can't directly declare an array of bits. However, since an array of any basic type is fundamentally made of bits, you can emulate them, something like this (code untested):

unsigned *array;
array = (unsigned *) malloc(100000000 / sizeof(unsigned) + 1);

/* Retrieves the value in bit i */
#define GET_BIT(array, i) (array[i / sizeof(unsigned)] & (1 << (i % sizeof(unsigned))))

/* Sets bit i to true*/
#define SET_BIT(array, i) (array[i / sizeof(unsigned)] |= (1 << (i % sizeof(unsigned))))

/* Sets bit i to false */
#define CLEAR_BIT(array, i) (array[i / sizeof(unsigned)] &= ~(1 << (i % sizeof(unsigned))))
Daniel Stutzbach
+9  A: 

If you are using C++, std::vector<bool> is specialized to pack elements into a bit map. Of course, if you are using C++, you need to stop using malloc.

Dennis Zickefoose
When using `vector<bool>` be aware that it does not staisfy the container concept of the STL, which may cause problems when using STL-algorithms. See http://www.sgi.com/tech/stl/bit_vector.html
Space_C0wb0y
I had forgotten std::vector<bool> was specialized. Good point.
cpalmer
What do you mean by specialized?
Eddy
@Eddy: I'd have to explain what a template is first. Have you used templates before?
Ben Voigt
@Eddy: Specialized (in this case) just means that for the datatype bool, vector doesn't use the generic vector implementation. This that would waste anywhere from 7 to 63 bits of data per entry on common architectures with common compilers. vector<bool> is implemented the same way I told you to do this the other day when you asked how to do it in C.
nategoose
A: 

Yes but it's going to be a little bit more complicated !

The better way to store bits is to use the bits into the char itself !

So you can store 8 bits in a char !

Which will "only" require 12'500'000 octets !

Here is some documentation about binaries : http://www.somacon.com/p125.php

You should look on google :)

Niklaos
To you, everything is exciting! :)
John Dibling
+8  A: 

You could try looking at boost::dynamic_bitset. Then you could do something like the following (taken from Boost's example page):

boost::dynamic_bitset<> x(100000000); // all 0's by default
x[0] = 1;
x[1] = 1;
x[4] = 1;

The bitset will use a single bit for each element so you can store 32 items in the space of 4 bytes, decreasing the amount of memory required considerably.

cpalmer
Will this work? Aren't most C++ compilers going to try to allocate x on a stack, which won't have room for such a large object? The stack is typically only 1MB or so. I think you'll probably have to use new to allocate it from the heap.
Charles E. Grant
@Charles: `x` will be on the stack, but the memory to store the bits will not be. Like `vector`, `boost::dynamic_bitset` takes an allocator template parameter, and by default allocates with `new`. There is no way in standard C++ to put anything on the stack whose size isn't known at compile time.
Steve Jessop
x is on the stack, but that doesn't mean x's members are.
Dennis Zickefoose
@Dennis: x's members are also on the stack... but this doesn't mean some of the members couldn't be pointers to dynamically allocated memory.
UncleBens
+3  A: 

The segmentation fault you noticed is due to running out of stack space. Of course you can't declare a local variable that is 12.5 MB in size (100 million bits), let alone 100MB in size (100 million bytes) in a thread with a stack of ~ 4 MB. Should work as a global variable, although then you may end up with a 12 or 100 MB executable file -- still not a good idea. Dynamic allocation is definitely the right thing to do for large buffers like that.

Ben Voigt
A: 
lsalamon
A: 

I'm fond of the bitarray that's in the open source fxt library at http://www.jjj.de/fxt/. It's simple, efficient and contained in a few headers, so it's easy to add to your project. Plus there's many complementary functions to use with the bitarray (see http://www.jjj.de/bitwizardry/bitwizardrypage.html).

dimatura
+1  A: 

If it is allowed to use STL, then I would use std::bitset.

(For 100,000,000 bits, it would use 100000000 / 32 unsigned int underneath, each storing 32 bits.)

std::vector<bool>, already mentioned, is another good solution.

ArunSaha