views:

1571

answers:

16

The classic problem of testing and setting individual bits in an integer in C is perhaps one the most common intermediate-level programming skills. You set and test with simple bitmasks such as

unsigned int mask = 1<<11;

if (value & mask) {....} // Test for the bit
value |= mask;    // set the bit
value &= ~mask;   // clear the bit

An interesting blog post argues that this is error prone, difficult to maintain, and poor practice. The C language itself provides bit level access which is typesafe and portable:

typedef unsigned int boolean_t;
#define FALSE 0
#define TRUE !FALSE
typedef union {
        struct {
                boolean_t user:1;
                boolean_t zero:1;
                boolean_t force:1;
                int :28;                /* unused */
                boolean_t compat:1;     /* bit 31 */
        };
        int raw;
} flags_t;

int
create_object(flags_t flags)
{
        boolean_t is_compat = flags.compat;

        if (is_compat)
                flags.force = FALSE;

        if (flags.force) {
                [...]
        }
        [...]
}

But this makes me cringe.

The interesting argument my coworker and I had about this is still unresolved. Both styles work, and I maintain the classic bitmask method is easy, safe, and clear. My coworker agrees it's common and easy, but the bitfield union method is worth the extra few lines to make it portable and safer.

Is there any more arguments for either side? In particular is there some possible failure, perhaps with endianness, that the bitmask method may miss but where the structure method is safe?

+4  A: 

Either way, bitfields have been used in GNU software for decades and it hasn't done them any harm. I like them as parameters to functions.

I would argue that bitfields are conventional as opposed to structs. Everyone knows how to AND the values to set various options off and the compiler boils this down to very efficient bitwise operations on the CPU.

Providing you use the masks and tests in the correct way, the abstractions the compiler provide should make it robust, simple, readable and clean.

When I need a set of on/off switches, Im going to continue using them in C.

Aiden Bell
+13  A: 

Bitfields are not quite as portable as you think, as "C gives no guarantee of the ordering of fields within machine words" (The C book)

Ignoring that, used correctly, either method is safe. Both methods also allow symbolic access to integral variables. You can argue that the bitfield method is easier to write, but it also means more code to review.

Matthew Flaschen
I've had issues with porting code to a compiler where the stupid bit field order was backwards. Very annoying. I'll stick with masks, thanks. :)
darron
@Matthew Flaschen: How does `flags.compat` and `flags.force` work? Shouldn't they be `flags.{structure_name}.compat` and `flags.{structure_name}.force`?
Lazer
@eSKay, you are correct. The C standard does not support unnamed struct fields (which are being used in the question), but gcc (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Unnamed-Fields.html) and Visual C++ (http://msdn.microsoft.com/en-us/library/z2cx9y4f.aspx) do as extensions
Matthew Flaschen
@Matthew Flaschen: thanks for the links!
Lazer
+9  A: 

You have to think about this from the perspective of a writer -- know your audience. So there are a couple of "audiences" to consider.

First there's the classic C programmer, who have bitmasked their whole lives and could do it in their sleep.

Second there's the newb, who has no idea what all this |, & stuff is. They were programming php at their last job and now they work for you. (I say this as a newb who does php)

If you write to satisfy the first audience (that is bitmask-all-day-long), you'll make them very happy, and they'll be able to maintain the code blindfolded. However, the newb will likely need to overcome a large learning curve before they are able to maintain your code. They will need to learn about binary operators, how you use these operations to set/clear bits, etc. You're almost certainly going to have bugs introduced by the newb as he/she all the tricks required to get this to work.

On the other hand, if you write to satisfy the second audience, the newbs will have an easier time maintaining the code. They'll have an easier time groking

 flags.force = 0;

than

 flags &= 0xFFFFFFFE;

and the first audience will just get grumpy, but its hard to imagine they wouldn't be able to grok and maintain the new syntax. It's just much harder to screw up. There won't be new bugs, because the newb will more easily maintain the code. You'll just get lectures about how "back in my day you needed a steady hand and a magnetized needle to set bits... we didn't even HAVE bitmasks!" (thanks XKCD).

So I would strongly recommend using the fields over the bitmasks to newb-safe your code.

Doug T.
"It's just much harder to screw up." Bitfields, in a nutshell.
Roddy
got url for xkcd strip?
Joakim Elofsson
http://xkcd.com/378/ There you go ;)
Rekreativc
That |= sets a whole lot of flag bits. flags ** maybe? you're actually helping the bit field position. And, based on his definition, it would be 0xFFFFFFFB on systems where the top bitfield is bit 0.
darron
yes, your are correct dblack. But I guess that proves the overall point :)
Doug T.
to clear bitmask N, I'd use flags This has the advantage of not being tied to the size of flags. Run your code with 64-bit ints and your are slightly scr*wed...
Roddy
I really wouldn't consider bit masking that hard. 30 minutes (or less) worth of reading on Wikipedia and you've got enough to grok what's going on.
Bob Somers
+5  A: 

The blog post you are referring to mentions raw union field as alternative access method for bitfields.

The purposes blog post author used raw for are ok, however if you plan to use it for anything else (e.g. serialisation of bit fields, setting/checking individual bits), disaster is just waiting for you around the corner. The ordering of bits in memory is architecture dependent and memory padding rules vary from compiler to compiler (see wikipedia), so exact position of each bitfield may differs, in other words you never can be sure which bit of raw each bitfield corresponds to.

However if you don't plan to mix it you better take raw out and you will be safe.

qrdl
Endianness should not affect it providing you use concrete representation for masks. This way, both are affected equally by endianness meaning that high/low bits correlate.
Aiden Bell
RBerteig
@RBerteig Exactly it was my point! In initial edition of my post I also mentioned endianness but because it is not guaranteed even on architectures with same endianness I took it out.
qrdl
+5  A: 

What it is about the bitfield approach that makes you cringe?

Both techniques have their place, and the only decision I have is which one to use:

For simple "one-off" bit fiddling, I use the bitwise operators directly.

For anything more complex - eg hardware register maps, the bitfield approach wins hands down.

  • Bitfields are more succinct to use (at the expense of /slightly/ more verbosity to write.
  • Bitfields are more robust (what size is "int", anyway)
  • Bitfields are usually just as fast as bitwise operators.
  • Bitfields are very powerful when you have a mix of single and multiple bit fields, and extracting the multiple-bit field involves loads of manual shifts.
  • Bitfields are effectively self-documenting. By defining the structure and therefore naming the elements, I know what it's meant to do.
  • Bitfields also seamlessly handle structures bigger than a single int.
  • With bitwise operators, typical (bad) practice is a slew of #defines for the bit masks.

  • The only caveat with bitfields is to make sure the compiler has really packed the object into the size you wanted. I can't remember if this is define by the standard, so a assert(sizeof(myStruct) == N) is a useful check.

Roddy
I have found as I move from one embedded platform to another that it often requires writing sample code and checking the generated assembly to confirm the mapping from the bitfield in a struct to the actual machine word. Too many compilers organize their documentation well enough to make this easy to look up. I've also had problems where the hardware requires that memory accesses be guaranteed to be a particular size, and I had to find a barely documented switch to turn off the optimization that broke that.
RBerteig
A: 

Generally, the one that is easier to read and understand is the one that is also easier to maintain. If you have co-workers that are new to C, the "safer" approach will probably be the easier one for them to understand.

Mike
I look at the two pieces of code, and to me the first one looks easier to understand. The second one may well be safer.
Nosredna
Mike
+1  A: 

Well you can't go wrong with structure mapping since both fields are accessable they can be used interchangably.

One benefit for bit fields is that you can easily aggregate options:

mask = USER|FORCE|ZERO|COMPAT;

vs

flags.user = true;
flags.force = true;
flags.zero = true;
flags.compat = true;

In some environments such as dealing with protocol options it can get quite old having to individually set options or use multiple parameters to ferry intermediate states to effect a final outcome.

But sometimes setting flag.blah and having the list popup in your IDE is great especially if your like me and can't remember the name of the flag you want to set without constantly referencing the list.

I personally will sometimes shy away from declaring boolean types because at some point I'll end up with the mistaken impression that the field I just toggled was not dependent (Think multi-thread concurrency) on the r/w status of other "seemingly" unrelated fields which happen to share the same 32-bit word.

My vote is that it depends on the context of the situation and in some cases both approaches may work out great.

Einstein
+3  A: 

If the issue is that setting and clearing bits is error prone, then the right thing to do is to write functions or macros to make sure you do it right.

// off the top of my head
#define SET_BIT(val, bitIndex) val |= (1 << bitIndex)
#define CLEAR_BIT(val, bitIndex) val &= ~(1 << bitIndex)
#define TOGGLE_BIT(val, bitIndex) val ^= (1 << bitIndex)
#define BIT_IS_SET(val, bitIndex) (val & (1 << bitIndex))

Which makes your code readable if you don't mind that val has to be an lvalue except for BIT_IS_SET. If that doesn't make you happy, then you take out assignment, parenthesize it and use it as val = SET_BIT(val, someIndex); which will be equivalent.

Really, the answer is to consider decoupling the what you want from how you want to do it.

plinth
+3  A: 

Your first method is preferable, IMHO. Why obfuscate the issue? Bit fiddling is a really basic thing. C did it right. Endianess doesn't matter. The only thing the union solution does is name things. 11 might be mysterious, but #defined to a meaningful name or enum'ed should suffice.

Programmers who can't handle fundamentals like "|&^~" are probably in the wrong line of work.

xcramps
Aiden Bell
@Aiden - I disagree. No matter what kind of programmer you are, even if you spend your entire day writing web apps with functional scripting meta-languages, if you don't understand bits and bytes you're missing the fundamental underpinnings of what is happening. @xcramps, endianness absolultely matters (especially for anything using a network) if you're dealing with anything larger than a byte, like the op's example which dealt with 32-bit ints.
Bob Somers
+2  A: 

When I google for "c operators"

The first three pages are:

http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V40F_HTML/AQTLTBTE/DOCU_059.HTM http://www.cs.mun.ca/~michael/c/op.html

..so I think that argument about people new to the language is a little silly.

San Jacinto
I agree wholeheartedly. I believe twenty minutes is enough to get a pretty good grasp on bitwise operators.
Adrian Panasiuk
+1  A: 

In C++, just use std::bitset<N>.

Steve Jessop
+2  A: 

It is error-prone, yes. I've seen lots of errors in this kind of code. Mainly because some people feel that they should mess with it and the business logic in a totally disorganized way, creating maintenance nightmares. They think "real" programmers can write value |= mask; , value &= ~mask; or even worse things at any place, and that's just ok. Even better if there's some increment operator around, a couple of memcpy's, pointer casts and whatever obscure and error-prone syntax happens to come to their mind at that time. Of course there's no need to be consistent and you can flip bits in two or three different ways, distributed randomly.

My advice would be:

  1. Encapsulate this ---- in a class, with methods such as SetBit(...) and ClearBit(...). (If you don't have classes in C, in a module.) While you're at it, you can document all their behaviour.
  2. Unit test that class or module.
Daniel Daranas
+1 for encaptulation idea.
Richard Corden
Class, methods, module? I can't understand a word you're saying.
Nosredna
@Nosredna: You could always ask.
Daniel Daranas
increment operators, memcpy's, and pointer casts - do you seriously call that obscure? referenes, copy constructors, templates, and operator overloading, now that's obscure!
James Morris
I mentioned "increment operators" around as a way of saying that there's a mixture of queries and commands in the same instruction. And memcpy's are too low level, hence obscure. It's the result of the typical low level salad that messes around with byte operation and several value modifications at the same time that I call, as a result, obscure. You can write really hard to maintain code which, read line by line, is clear what it "does", but nobody knows really why.
Daniel Daranas
+1  A: 

I nearly always use the logical operations with a bit mask, either directly or as a macro. e.g.

 #define  ASSERT_GPS_RESET()                    { P1OUT &= ~GPS_RESET ; }

incidentally your union definition in the original question would not work on my processor/compiler combination. The int type is only 16 bits wide and the bitfield definitions are 32. To make it slightly more portable then you would have to define a new 32 bit type that you could then map to the required base type on each target architecture as part of the porting exercise. In my case

typedef   unsigned long int     uint32_t

and in the original example

typedef unsigned int uint32_t

typedef union {
        struct {
                boolean_t user:1;
                boolean_t zero:1;
                boolean_t force:1;
                int :28;                /* unused */
                boolean_t compat:1;     /* bit 31 */
        };
        uint32_t raw;
} flags_t;

The overlaid int should also be made unsigned.

Ian
+2  A: 

Well, I suppose that's one way of doing it, but I would always prefer to keep it simple.

Once you're used to it, using masks is straightforward, unambiguous and portable.

Bitfields are straightforward, but they are not portable without having to do additional work.

If you ever have to write MISRA-compliant code, the MISRA guidelines frown on bitfields, unions, and many, many other aspects of C, in order to avoid undefined or implementation-dependent behaviour.

Steve Melnikoff
+4  A: 

The union usage has undefined behavior according to the ANSI C standard, and thus, should not be used (or at least not be considered portable).

From the ISO/IEC 9899:1999 (C99) standard:

Annex J - Portability Issues:

1 The following are unspecified:

— The value of padding bytes when storing values in structures or unions (6.2.6.1).

— The value of a union member other than the last one stored into (6.2.6.1).

6.2.6.1 - Language Concepts - Representation of Types - General:

6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.[42]) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

7 When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

So, if you want to keep the bitfield ↔ integer correspondence, and to keep portability, I strongly suggest you to use the bitmasking method, that contrary to the linked blog post, it is not poor practice.

Juliano
+2  A: 
Norman Ramsey