views:

305

answers:

6

I have just discovered the joy of bitflags. I have several questions related to "best-practices" regarding the use of bitflags in C. I learned everything from various examples I found on the web but still have questions.

In order to save space, I am using a single 32bit integer field in a struct (A->flag) to represent several different sets of boolean properties. In all, 20 different bits are #defined. Some of these are truly presence/absence flags (STORAGE-INTERNAL vs. STORAGE-EXTERNAL). Others have more than two values (e.g. mutually exclusive set of formats: FORMAT-A, FORMAT-B, FORMAT-C). I have defined macros for setting specific bits (and simultaneously turning off mutually exclusive bits). I have also defined macros for testing if specific combination of bits are set in the flag.

However, what is lost in the above approach is the specific grouping of flags that is best captured by enums. For writing functions, I would like to use enums (e.g., STORAGE-TYPE and FORMAT-TYPE), so that function definitions look nice. I expect to use enums only for passing parameters and #defined macros for setting and testing flags.

(a) How do I define flag (A->flag) as a 32 bit integer in a portable fashion (across 32 bit / 64 bit platforms)?

(b) Should I worry about potential size differences in how A->flag vs. #defined constants vs. enums are stored?

(c) Am I making things unnecessarily complicated, meaning should I just stick to using #defined constants for passing parameters as ordinary ints? What else should I worry about in all this?

I apologize for the poorly articulated question. It reflects my ignorance about potential issues.

+2  A: 

For question a, if you are using C99 (you probably are using it), you can use the uint32_t predefined type (or, if it is not predefined, it can be found in the stdint.h header file).

Pierre Bourdon
`uint32_t` is not necessarily defined on all implementations. It is only defined on implementations where it makes sense to have such a type (granted, that's 99.9% of current desktop implementations).See point 7.18.1.1 in the Standard ("#3 These types are optional ...")
pmg
I don't even see what could be the last 0.01% :)
Pierre Bourdon
Also, `unsigned long` is guaranteed by the standard to at least go up to 2 ** 32 - 1. You _could_ (in the event that `uint32_t` isn't portable enough) use that to guarantee at least 32 bits. But pmg's answer is even better IMHO.
Chris Lutz
@delroth: there were Cray machines had 32-bit 'short' and 64-bit 'int' values. They could define a uint32_t; they could not define a uint16_t. There were machines with other odd-ball word lengths; Burroughs mainframes had 36-bit integers and would not define uint32_t (but could do uint_least32_t). For Cray, see: http://docs.cray.com/books/004-2179-001/html-004-2179-001/rvc5mrwh.html; for Burroughs, see: http://www.fourmilab.ch/documents/univac/case1107.html
Jonathan Leffler
@Lutz, Can you provide a reference to that? From what I know, `long` can possibly be 16 bits.
strager
+6  A: 

There is a C99 header that was intended to solve that exact problem (a) but for some reason Microsoft doesn't implement it. Fortunately, you can get <stdint.h> for Microsoft Windows here. Every other platform will already have it. The 32-bit int types are uint32_t and int32_t. These also come in 8, 16, and 64- bit flavors.

So, that takes care of (a).

(b) and (c) are kind of the same question. We do make assumptions whenever we develop something. You assume that C will be available. You assume that <stdint.h> can be found somewhere. You could always assume that int was at least 16 bits and now a >= 32 bit assumption is fairly reasonable.

In general, you should try to write conforming programs that don't depend on layout, but they will make assumptions about word length. You should worry about performance at the algorithm level, that is, am I writing something that is quadratic, polynomial, exponential?

You should not worry about performance at the operation level until (a) you notice a performance lag, and (b) you have profiled your program. You need to get your job done without bogging down worrying about individual operations. :-)

Oh, I should add that you particularly don't need to worry about low level operation performance when you are writing the program in C in the first place. C is the close-to-the-metal go-as-fast-as-possible language. We routinely write stuff in php, python, ruby, or lisp because we want a powerful language and the CPU's are so fast these days that we can get away with an entire interpreter, never mind a not-perfect choice of bit-twiddle-word-length ops. :-)

DigitalRoss
Another note on a) that I mentioned the comments of @delroth's post is that `unsigned long` _has_ to be at least 32 bits.
Chris Lutz
Hi DigitalRoss, thanks for the response. That helps. I am using GCC, which has uint32_t. Are there any examples of #ifdef... macros for specifying sizes etc? Perfrmance concerns - point well taken. I was mostly worried about doing something non-standard and therefore inherently problematic. ~Russ
You can detect whether uint32_t is defined by testing the corresponding limit macro: UINT32_MAX. Note that the exact macros (which most people use most of the time) are not guaranteed to be present; the uint_least32_t and uint_fast32_t types are required (and their limit macros are UINT_LEAST32_MAX and UINT_FAST32_MAX). The least and fast types are required for sizes 8, 16, 32, 64 - an implementation must provide some integer with those types to be compliant with the C99 standard.
Jonathan Leffler
Thanks Jonathan. ~Russ
+6  A: 

You can use bit-fields and let the compiler do the bit twiddling. For example:

struct PropertySet {
  unsigned internal_storage : 1;
  unsigned format : 4;
};

int main(void) {
  struct PropertySet x;
  struct PropertySet y[10]; /* array of structures containing bit-fields */
  if (x.internal_storage) x.format |= 2;
  if (y[2].internal_storage) y[2].format |= 2;
  return 0;
}

Edited to add array of structures

pmg
A nice extension of this is to use `union PropSet { unsigned all_bits; struct { unsigned internal_storage: 1; unsigned format: 4 } bits; };` This lets you clear and set all of the bits in one shot without a `memset()` or something goofy.
D.Shawley
Don't do this if you are storing the structure in a file or sending the information over a network. Different compilers may pack the structures differently, particularly if they have different endian-ness.
Graeme Perrow
thanx for the response. I like the bitfields solution (and the extension) as it is more explicit. Separately, we need to have an array of the field, which as I understand are not allowed for bitfields.
You cannot have an array of bit-fields, but you can have an array of structures containing bit-fields.
pmg
Thanks pmg!! I noticed it just as soon as i posted it. ~Russ
+3  A: 

As others have said, your problem (a) is resolvable by using <stdint.h> and either uint32_t or uint_least32_t (if you want to worry about Burroughs mainframes which have 36-bit words). Note that MSVC does not support C99, but @DigitalRoss shows where you can obtain a suitable header to use with MSVC.

Your problem (b) is not an issue; C will type convert safely for you if it is necessary, but it probably isn't even necessary.

The area of most concern is (c) and in particular the format sub-field. There, 3 values are valid. You can handle this by allocating 3 bits and requiring that the 3-bit field is one of the values 1, 2, or 4 (any other value is invalid because of too many or too few bits set). Or you could allocate a 2-bit number, and specify that either 0 or 3 (or, if you really want to, one of 1 or 2) is invalid. The first approach uses one more bit (not currently a problem since you're only using 20 of 32 bits) but is a pure bitflag approach.

When writing function calls, there is no particular problem writing:

some_function(FORMAT_A | STORAGE_INTERNAL, ...);

This will work whether FORMAT_A is a #define or an enum (as long as you specify the enum value correctly). The called code should check whether the caller had a lapse in concentration and wrote:

some_function(FORMAT_A | FORMAT_B, ...);

But that is an internal check for the module to worry about, not a check for users of the module to worry about.

If people are going to be switching bits in the flags member around a lot, the macros for setting and unsetting the format field might be beneficial. Some might argue that any pure boolean fields barely need it, though (and I'd sympathize). It might be best to treat the flags member as opaque and provide 'functions' (or macros) to get or set all the fields. The less people can get wrong, the less will go wrong.

Consider whether using bit-fields works for you. My experience is that they lead to big code and not necessarily very efficient code; YMMV.

Hmmm...nothing very definitive here, so far.

  • I would use enums for everything because those are guaranteed to be visible in a debugger where #define values are not.
  • I would probably not provide macros to get or set bits, but I'm a cruel person at times.
  • I would provide guidance on how to set the format part of the flags field, and might provide a macro to do that.

Like this, perhaps:

enum { ..., FORMAT_A = 0x0010, FORMAT_B = 0x0020, FORMAT_C = 0x0040, ... };
enum { FORMAT_MASK = FORMAT_A | FORMAT_B | FORMAT_C };

#define SET_FORMAT(flag, newval)    (((flag) & ~FORMAT_MASK) | (newval))
#define GET_FORMAT(flag)            ((flag) & FORMAT_MASK)

SET_FORMAT is safe if used accurately but horrid if abused. One advantage of the macros is that you could replace them with a function that validated things thoroughly if necessary; this works well if people use the macros consistently.

Jonathan Leffler
+1  A: 

Regarding (c): if your enumerations are defined correctly you should be able to pass them as arguments without a problem. A few things to consider:

  1. enumeration storage is often compiler specific, so depending on what kind of development you are doing (you don't mention if it's Windows vs. Linux vs. embedded vs. embedded Linux :) ) you may want to visit compiler options for enum storage to make sure there are no issues there. I generally agree with the above consensus that the compiler should cast your enumerations appropriately - but it's something to be aware of.
  2. in the case that you are doing embedded work, many static quality checking programs such as PC Lint will "bark" if you start getting too clever with enums, #defines, and bitfields. If you are doing development that will need to pass through any quality gates, this might be something to keep in mind. In fact, some automotive standards (such as MISRA-C) get downright irritable if you try to get trig with bitfields.
  3. "I have just discovered the joy of bitflags." I agree with you - I find them very useful.
dls
A: 

Hi,

I added comments to each answer above. I think I have some clarity. It seems enums are cleaner as it shows up in debugger and keeps fields separate. macros can be used for setting and getting values.

I have also read that enums are stored as small integers - which as I understand it, is not a problem with the boolean tests as these would be peroformed starting at the right most bits. But, can enums be used to store large integers (1 << 21)??

thanks again to you all. I have already learned more than I did two days ago!!

~Russ

C99 standard, section 6.7.2.2: "The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an `int`". A 21-bit enumeration constant is not, therefore, portable to machines with 16-bit `int` types - which is still standard conforming AFAIK, but rather unusual. Unless you think 16-bit machines are in your possible target list, you don't need to worry (but trying a 64-bit constant (or a 33-bit one) would be foolhardy).
Jonathan Leffler