views:

1105

answers:

9

I'm trying to implement a data compression idea I've had, and since I'm imagining running it against a large corpus of test data, I had thought to code it in C (I mostly have experience in scripting languages like Ruby and Tcl.)

Looking through the O'Reilly 'cow' books on C, I realize that I can't simply index the bits of a simple 'char' or 'int' type variable as I'd like to to do bitwise comparisons and operators.

Am I correct in this perception? Is it reasonable for me to use an enumerated type for representing a bit (and make an array of these, and writing functions to convert to and from char)? If so, is such a type and functions defined in a standard library already somewhere? Are there other (better?) approaches? Is there some example code somewhere that someone could point me to?

Thanks -

+6  A: 

It is possible.

To set the nth bit, use OR:

x |= (1 << 5); // sets the 5th-from right

To clear a bit, use AND:

x &= ~(1 << 5); // clears 5th-from-right

To flip a bit, use XOR:

x ^= (1 << 5); // flips 5th-from-right

To get the value of a bit use shift and AND:

(x & (1 << 5)) >> 5 // gets the value (0 or 1) of the 5th-from-right

note: the shift right 5 is to ensure the value is either 0 or 1. If you're just interested in 0/not 0, you can get by without the shift.

Kyle Cronin
A: 

IF you want to index a bit you could:

bit = (char & 0xF0) >> 7;

gets the msb of a char. You could even leave out the right shift and do a test on 0.

bit = char & 0xF0;

if the bit is set the result will be > 0;

obviousuly, you need to change the mask to get different bits (NB: the 0xF is the bit mask if it is unclear). It is possible to define numerous masks e.g.

#define BIT_0 0x1 // or 1 << 0
#define BIT_1 0x2 // or 1 << 1
#define BIT_2 0x4 // or 1 << 2
#define BIT_3 0x8 // or 1 << 3

etc...

This gives you:

bit = char & BIT_1;

You can use these definitions in the above code to sucessfully index a bit within either a macro or a function.

To set a bit:

char |= BIT_2;

To clear a bit:

char &= ~BIT_3

To toggle a bit

char ^= BIT_4

This help?

TK
+3  A: 

Have a look at the answers to this question.

ChrisN
+1  A: 

To query state of bit with specific index:

int index_state = variable & ( 1 << bit_index );

To set bit:

varabile |= 1 << bit_index;

To restart bit:

variable &= ~( 1 << bit_index );
Mladen Jankovic
A: 

There is a standard library container for bits: std::vector. It is specialised in the library to be space efficient. There is also a boost dynamic_bitset class.

These will let you perform operations on a set of boolean values, using one bit per value of underlying storage.

Boost dynamic bitset documentation

For the STL documentation, see your compiler documentation.

Of course, you can also address the individual bits in other integral types by hand. If you do that, you should use unsigned types so that you don't get undefined behaviour if decide to do a right shift on a value with the high bit set. However, it sounds like you want the containers.

To the commenter who claimed this takes 32x more space than necessary: boost::dynamic_bitset and vector are specialised to use one bit per entry, and so there is not a space penalty, assuming that you actually want more than the number of bits in a primitive type. These classes allow you to address individual bits in a large container with efficient underlying storage. If you just want (say) 32 bits, by all means, use an int. If you want some large number of bits, you can use a library container.

janm
That uses 32x more data than is necessary. The original poster said he was interested in data compression. This sounds like the opposite!
Mark Ingram
No, the point about vector<bool> and boost::dynamic_bitset is that they use one bit per bool. To store 1024 bools, they will use 128 bytes plus the class overhead. How did you calculate 32x more storage than necessary?
janm
A: 

Try using bitfields. Be careful the implementation can vary by compiler.

http://publications.gbdirect.co.uk/c_book/chapter6/bitfields.html

+1  A: 

Theory

There is no C syntax for accessing or setting the n-th bit of a built-in datatype (e.g. a 'char'). However, you can access bits using a logical AND operation, and set bits using a logical OR operation.

As an example, say that you have a variable that holds 1101 and you want to check the 2nd bit from the left. Simply perform a logical AND with 0100:

1101
0100
---- AND
0100

If the result is non-zero, then the 2nd bit must have been set; otherwise is was not set.

If you want to set the 3rd bit from the left, then perform a logical OR with 0010:

1101
0010
---- OR
1111

You can use the C operators && (for AND) and || (for OR) to perform these tasks. You will need to construct the bit access patterns (the 0100 and 0010 in the above examples) yourself. The trick is to remember that the least significant bit (LSB) counts 1s, the next LSB counts 2s, then 4s etc. So, the bit access pattern for the n-th LSB (starting at 0) is simply the value of 2^n. The easiest way to compute this in C is to shift the binary value 0001 (in this four bit example) to the left by the required number of places. As this value is always equal to 1 in unsigned integer-like quantities, this is just '1 << n'

Example

unsigned char myVal = 0x65; /* in hex; this is 01100101 in binary. */

/* Q: is the 3-rd least significant bit set (again, the LSB is the 0th bit)? */
unsigned char pattern = 1;
pattern <<= 3; /* Shift pattern left by three places.*/

if(myVal && (char)(1<<3)) {printf("Yes!\n");} /* Perform the test. */

/* Set the most significant bit. */
myVal |= (char)(1<<7);

This example hasn't been tested, but should serve to illustrate the general idea.

Microserf
+5  A: 

Following on from what Kyle has said, you can use a macro to do the hard work for you.

It is possible.

To set the nth bit, use OR:

x |= (1 << 5); // sets the 5th-from right

To clear a bit, use AND:

x &= ~(1 << 5); // clears 5th-from-right

To flip a bit, use XOR:

x ^= (1 << 5); // flips 5th-from-right

Or...

#define GetBit(var, bit) ((var & (1 << bit)) != 0) // Returns true / false if bit is set
#define SetBit(var, bit) (var |= (1 << bit))
#define FlipBit(var, bit) (var ^= (1 << bit))

Then you can use it in code like:

int myVar = 0;
SetBit(myVar, 5);
if (GetBit(myVar, 5))
{
  // Do something
}
Mark Ingram
This is very helpful, thanks very much.
+1  A: 

Individual bits can be indexed as follows.

Define a struct like this one:

struct
{
  unsigned bit0  : 1;
  unsigned bit1  : 1;
  unsigned bit2  : 1;
  unsigned bit3  : 1;
  unsigned reserved : 28;
} bitPattern;

Now if I want to know the individual bit values of a var named "value", do the following:

CopyMemory( &input, &value, sizeof(value) );

To see if bit 2 is high or low:

int state = bitPattern.bit2;

Hope this helps.

Mark D