tags:

views:

1414

answers:

17

Hello,

I am trying to store a large amount of boolean information that is determined at run-time. I was wondering what the best method might be.

I have currently been trying to allocate the memory using:

pStatus = malloc((<number of data points>/8) + 1);

thinking that this will give me enough bits to work with. I could then reference each boolean value using the pointer in array notation:

pStatus[element]

Unfortunately this does not seem to be working very well. First, I am having difficulty initializing the memory to the integer value 0. Can this be done using memset()? Still, I don't think that is impacting why I crash when trying to access pStatus[element].

I am also not entirely convinced that this approach is the best one to be using. What I really want is essentially a giant bitmask that reflects the status of the boolean values. Have I missed something?

A: 

pStatus[element] does not address the bit. The exact byte it gets is dependent on the type of pStatus -- I assume char* or equivalent -- so pStatus[element] gets you the element'th byte.

You could memset to set to 0, yes.

Lou Franco
+7  A: 

...thinking that this will give me enough bits to work with. I could then reference each boolean value using the pointer in array notation:

pStatus[element]

element is addressing bytes, not bits. You want something like:

pStatus[element/8] & (1 << (element % 8))
Paul Betts
Blair Conrad
Tim
A: 
 pStatus = malloc((<number of data points>/8) + 1);

That part's fine.

 pStatus[element]

here's where you have trouble. You are address bytes, when you want to address bits.

 pStatus[element / 8 ]

will get you the right byte in the array.

James Curran
+3  A: 

Well, the simplest answer would be to use calloc instead of malloc.

It is defined to initialize the memory it allocates to zero, and can often do it by using page mapping tricks.

That will take care of your memory initialization problem. The other dozen posts here seem to adequately address the indexing problem and the fact that you occasionally allocate an extra byte (oh the horror!), so I won't repeat their content here.

Edward Kmett
+1  A: 

pStatus[element] will give you an entire byte at that address.

To set a particular element you would do something like:

pStatus[element >> 3] |= 1 << (element & 7);

To reset an element:

pStatus[element >> 3] &= ~1 << (element & 7);

and to test an element:

if (pStatus[element >> 3] & (1 << (element & 7)) != 0)

the initial allocation should be

pstatus = malloc((<number of data points> + 7) / 8)

what you had will work but wastes a byte occasionally

Ferruccio
+18  A: 
pStatus = malloc((<number of data points>/8) + 1);

This does allocate enough bytes for your bits. However,

pStatus[element]

This accesses the element'th byte, not bit. So when element is more than one-eighth of the total number of bits, you're accessing off the end of the array allocated.

I would define a few helper functions

int get_bit(element)
{
    uint byte_index = element/8;
    uint bit_index = element % 8;
    uint bit_mask = ( 1 << bit_index)

    return ((pStatus[byte_index] & bit_mask) != 0);
}

void set_bit (element)
{
    uint byte_index = element/8;
    uint bit_index = element % 8;
    uint bit_mask = ( 1 << bit_index)

    pStatus[byte_index] |= bit_mask)
}

void clear_bit (element)
{
    uint byte_index = element/8;
    uint bit_index = element % 8;
    uint bit_mask = ( 1 << bit_index)

    pStatus[byte_index] &= ~bit_mask;
}

(error checking on range of element left out for clarity. You could make this macros, too)

Paul
with regards to get_bit - while it's wonderfully parallel to the other implementations, you might fetch the byte first, then check the bit on non-zero:uchar b = pStatus(byte_index); if (b) { uint bit_windex = element % 8; ... } else { return 0; }
plinth
Why? (not disagreeing, just wondering why). The compiler should make a decent job of either version.
Paul
You should not assume that a byte is 8 bits. Use CHAR_BIT instead of 8, include <limits.h> if necessary. This applies to most comments in this thread, it seems.
unwind
Not assuming that a byte is 8 bits maybe made sense in the past, but it seems a bit radical today.
Adam Byrtek
Depending on your computer/compiler, it may be more efficient to do this in terms of 32/64 bit [integer] words. My computer hasn't had an 8-bit memory bus since the '80s.
Mr.Ree
A: 

You need to allocate c = malloc((N+7)/8) bytes, and you can set the nth with

 c[n/8]=((c[n/8] & ~(0x80 >> (n%8))) | (0x80>>(n%8)));

clear with

 c[n/8] &= ~(0x80 >> (n%8));

and test with

 if(c[n/8] & (0x80 >> (n%8))) blah();
Rhythmic Fistman
Adam Liss
Hey, yeah. Thanks!
Rhythmic Fistman
A: 

You allocation code is correct, see the set_bit() and get_bit() functions given in this answer to access the boolean.

philippe
+1  A: 

Make yourself happier and define a type and functions to operate on that type. That way if you discover that bit accesses are too slow, you can change the unit of memory per boolean to a byte/word/long or adopt sparse/dynamic data structures if memory is really an issue (ie, if your sets are mostly zeros, you could just keep a list with the coordinates of the 1's.

You can write your code to be completely immune to changes to the implementation of your bit vector.

plinth
A: 

See the comments under the initial question. Thanks for your help S. Lott.

Tim
A: 

The boolean is "never" a separate value in C. So a struct might be in order to get you going.

It is true that you do not initialize the mem area so you need to do that individually.

Here is a simple example of how you could do it with unions structs and enums

typedef unsigned char           BYTE;
typedef unsigned short          WORD;
typedef unsigned long int       DWORD;
typedef unsigned long long int  DDWORD;
enum STATUS
{
    status0 = 0x01,
    status1 = 0x02,
    status2 = 0x04,
    status3 = 0x08,
    status4 = 0x10,
    status5 = 0x20,
    status6 = 0x40,
    status7 = 0x80,
status_group = status0 + status1 +status4
};
#define GET_STATUS( S ) ( ((status.DDBuf&(DDWORD)S)==(DDWORD)S) ? 1 : 0  )
#define SET_STATUS( S ) (  (status.DDBuf|=  (DDWORD)S) )
#define CLR_STATUS( S ) (  (status.DDBuf&= ~(DDWORD)S) )
static union {
 BYTE   BBuf[8];
 WORD   WWBuf[4];
 DWORD  DWBuf[2];
 DDWORD DDBuf;
}status;

int main(void)
{
    // Reset status bits
    status.BBuf[0] = 0;
    printf( "%d \n", GET_STATUS( status0 ) );

    SET_STATUS( status0 );
    printf( "%d \n", GET_STATUS( status0 ) );

    CLR_STATUS(status0);
    printf( "%d \n", GET_STATUS( status0 ) );
    SET_STATUS( status_group );
    printf( "%d \n", GET_STATUS( status0 ) );
    system( "pause" );
    return 0;
}

Hope this helps. This example can handle up until 64 status booleans and could be easy extended.

This exapmle is based on Char = 8 bits int = 16 bits long int = 32 bits and long long int = 64 bits

I have now also added support for status groups.

eaanon01
Joe Pineda
Also: you forgot to define macro WORD. And better to define both DWORD and WORD using macros __int32 and __int16 (I believe they're part of standard now), that way those elements are always the same size on all compilers and platforms.
Joe Pineda
@Joe Pineda: __int32 and __int16 is a MS-specific construct. The nearest equivalent in the standard is int32_t and int16_t (defined in <stdint.h> and (effectively) in <inttypes.h>. In general, it is better to use the unsigned equivalents for bit operations.
Jonathan Leffler
@eaanon01: How does this code deal with more than 32 bits?
Jonathan Leffler
Bacause the status var is 64 bits ( unsigned long int )
eaanon01
Wrong: WORD is 16 bits long, DWORD is 32. You'd need a "long long int" or an __int64 to get 64 bits, and then not all compilers support these options.
Joe Pineda
On a further thought, could be indeed that your "long int" is 64 bits long - which compiler/OS/version are you using? The ANSI standard only enforces size_of(char)<=size_of(short int)<=size_of(int)<=size_of(long int) Use __int64 or equivalent to make your intention obvious!
Joe Pineda
Joe Pineda: Yes you where correct. My mistake. I have corrected it now. Please comment for more improments. I think this is a more readable slolution that using functions. But depends a bit about the requirements as well.
eaanon01
For instance, have a look at this thread: http://linux.derkeiler.com/Newsgroups/comp.os.linux.development.apps/2006-08/msg00152.htmlSame compiler, 64bits processor, different OS (32/64 bits Linux), on one you have size_of(long int)=4 and the other it's 8 :(
Joe Pineda
OK, go macros if you wish (I'll say "told you!" when you get hurt). Though you still have a superfluous comparison on GET_STATUS!!! Think of it, after you apply the bitmask, value is either 0 or not, depending on the bit you want, can't be something else...
Joe Pineda
eaanon01
A: 

If you don't mind having to write wrappers, you could also use either bit_set or bit_vector from C++'s STL, seems like they (especially the latter) have exactly what you need, already coded, tested and packaged (and plenty of bells and whistles).

It's a real shame we lack a straight forward way to use C++ code in C applications (no, creating a wrapper isn't straight-forward to me, nor fun, and means more work in the long term).

Joe Pineda
A: 

If you are limited to just a few bits you can instead of eaanon01 solution also use the c builtin facility of bitfield (there are very few occasion where you could use them, but this would be one)

For this bit banging stuff I can recommendate: Herny Warrens "Hacker Delight"

flolo
+2  A: 

I can't help but notice that all replies in C here seem to assume that a byte is 8 bits. This is not necessarily true in C (although it will of course be true on most mainstream hardware), so making this assumption in code is rather bad form.

The proper way to write architecture-neutral code is to

#include <limits.h>

and then use the CHAR_BIT macro wherever you need "the number of bits in a char".

unwind
+4  A: 

Small point: to get enough memory to store N bits, (N/8) + 1 bytes is imprecise (can be one too many).

(N+7)/8 is always the minimum number, though.

orip
A: 

What would be wrong with std::vector<bool>?

pauldoo
Mr.Ree
Correction: Apparently newer compilers now specialize vector<bool> to provide one entry per bit. I stand corrected.
Mr.Ree
+1  A: 

It amazes me that only one answer here mentions CHAR_BIT. A byte is often 8 bits, but not always.

tpdi
On all surviving non-embedded platfomrs CHAR_BIT is 8.
Joshua
Much of the C I do is for embedded devices. (Admittedly, on most of those, CHAR_BITS still 8). When the ANSI Standard says CHAR_BIT is always 8, we can dispense with CHAR_BIT.
tpdi