tags:

views:

996

answers:

5

I'm programming network headers and a lot of protocols use 4 bits fields. Is there a convenient type I can use to represent this information?

The smallest type I've found is a BYTE. I must then use a lot of binary operations to reference only a few bits inside that variable.

A: 

Hi Eric,

No, there are no convenient types for nibbles. But, its easy to make them with macros or with template functions. This works well espeicaly if/when you need to deal with endian-ness.

Foredecker

Foredecker
+12  A: 

Since the memory is byte-addressed, you can't address any unit smaller than a single byte. However, you can build the struct you want to send over the network and use bit fields like this:

struct A {
   unsigned int nibble1 : 4;
   unsigned int nibble2 : 4;
};
Mehrdad Afshari
Note that it's important to flag the variable as "unsigned", otherwise the compiler will treat is as signed and you'll see negative numbers.
Sean
Note that using bit-fields is not necessarily all that efficient. This was classically a problem; it may be that modern compilers have improved. There's at least a decent chance that masking and shifting will be faster - though not necessarily clearer.
Jonathan Leffler
This structure has the size of an 'unsigned int', because of the padding, at least here on linux. So wouldn't it be better to use 'unsigned char'?
quinmars
@quinmars: I suggest telling the compiler the struct is packed instead of declaring it as `unsigned char`. In gcc, you'd do it by adding __attribute__((__packed__)) after the closing brace (before semicolon). In MSVC you enclose the struct declaration in #pragma pack (push,1) and #pragma pack (pop) directives (this is also supported in gcc for compatibility)
Mehrdad Afshari
why using compiler-hacks, when you simply can use 'unsigned char'? Is there any disadvantage that I have missed?
quinmars
It's not compiler *hack*. It's just a directive that tells the compiler not to add unnecessary padding stuff. The bad thing about using `unsigned char` is that you still cannot be sure the specific compiler doesn't add padding to align it to a word or dword boundary (While it's the case for this example, it might not be for more complex cases).
Mehrdad Afshari
+1  A: 

Use fields in a struct:

struct Header
{
    unsigned int lowestNibble : 4;
    unsigned int anotherNibble : 4;
    unsigned int : 18;                 # Unnamed padding.
    bool aBool : 1;
    bool anotherBool : 1;
    unsigned int highestNibble : 4;
};

The : 4 indicates that entry should occupy 4 bits. You can use any number of bits you like. You can use any built in type you like.

Typically you end up casting a pointer to your data to a Header * then doing something like:

pHeader->lowestNibble = 5;
Jon-Eric
I'm not sure that this is really safe. I just compiled this under VS8 and sizeof(Header) is 12 bytes. If you memset the entire thing to zero and then individually set the bits to "all bits on", you'll see that your two bool fields occupy the low bits of the fifth byte - they start at bit 38 instead of bit 26 which one might be lead to expect. Basically, the "unsigned int" part is going to take the full sizeof(unsigned int)*_CHAR_BIT bits.
D.Shawley
+8  A: 

Expanding on Mehrdads answer, also use a union with a byte in order to avoid some evil-looking casts:

union Nibbler {
     struct { 
        unsigned int first:4;
        unsigned int second:4;
     } nibbles;
     unsigned char byte_value;
}
Christoffer
+4  A: 

Everyone seems to like using bit-fields in structs for this. Personally, I wrap all of my packet code in objects so that you don't see the guts. The problem that I have found with using bit-fields for protocol code is that it encourages using structures as overlays on memory. You can do this safely, but you have to be excruciatingly careful to ensure that you are properly dealing with endianess and packing issues. Unless you really have a good reason (e.g., you're writing the code that receives the Ethernet packet from the memory-mapped IO region), then using bit-fields overlaid on memory produces extremely fragile code IMHO.

I find it much easier to write a Packet class that implements extraction, insertion, and overwriting routines in various bit widths. Then you implement your packet processing code in terms of extracting values of certain widths from offsets into native integers and what not. Hide all of the endianess and packing issues behind an abstraction until profiling proves that the overhead is too great to bear.

This is one of those lessons that I wish I had learned years ago... you might think that portability of code isn't a problem and neither is endianess. Trust me, the number of headaches that this causes you when your compiler changes its padding algorithm or you switch to a different compiler will convince you that overlays are a very bad idea for network packet processing code.

D.Shawley
The real issue for your solution is that C (the language most device drivers are written in) does not have classes. You'll need to resort to global functions that handle conversion issues. Your point about padding is correct but there's a solution to it: always explicitly specify padding fields and just tell the compiler to pack the struct
Mehrdad Afshari
@Mehrdad, D. Shawley said to implement a class but I think what he **meant** was to create an added higher level of abstraction to get rid of endianess/bit-twiddling/low-level-byte-ordering/etc issues. The trick is making sure your added layer of abstraction doesn't incur horrible performance hits ( For this subject, in my opinion would be relatively easy ).
Trevor Boyd Smith
@Mehrdad: the question is tagged as C/C++ so that is where the class idea came from. If we are writing in C, then Trevor is correct - implement the abstraction using a one of the OOP in C idioms. I would recommend looking at the Ethereal/Wireshark code as a very nice implementation of a byte buffer. They implement a Testy Virtual Buffer (TVB) which is pretty similar to what I would do in C++.
D.Shawley