views:

90

answers:

5

To clarify my question, let's start off with an example program:

#include <stdio.h>

#pragma pack(push,1)
struct cc {
    unsigned int a   :  3;  
    unsigned int b   : 16;
    unsigned int c   :  1;
    unsigned int d   :  1;
    unsigned int e   :  1;
    unsigned int f   :  1;
    unsigned int g   :  1;
    unsigned int h   :  1;
    unsigned int i   :  6;  
    unsigned int j   :  6;  
    unsigned int k   :  4;  
    unsigned int l   : 15;
};
#pragma pack(pop)

struct cc c;

int main(int argc, char **argv)

{   printf("%d\n",sizeof(c));
}

The output is "8", meaning that the 56 bits (7 bytes) I want to pack are being packed into 8 bytes, seemingly wasting a whole byte. Curious about how the compiler was laying these bits out in memory, I tried writing specific values to &c, e.g.:

int main(int argc, char **argv)

{
unsigned long long int* pint = &c;
*pint = 0xFFFFFFFF;
printf("c.a = %d", c.a);
...
printf("c.l = %d", c.l);
}

Predictably, on x86_64 using Visual Studio 2010, the following happens:

*pint = 0x00000000 000000FF :

c[0].a = 7
c[0].b = 1
c[0].c = 1
c[0].d = 1
c[0].e = 1
c[0].f = 1
c[0].g = 0
c[0].h = 0
c[0].i = 0
c[0].j = 0
c[0].k = 0
c[0].l = 0

*pint = 0x00000000 0000FF00 :

c[0].a = 0
c[0].b = 0
c[0].c = 0
c[0].d = 0
c[0].e = 0
c[0].f = 0
c[0].g = 1
c[0].h = 127
c[0].i = 0
c[0].j = 0
c[0].k = 0
c[0].l = 0


*pint = 0x00000000 00FF0000 :

c[0].a = 0
c[0].b = 0
c[0].c = 0
c[0].d = 0
c[0].e = 0
c[0].f = 0
c[0].g = 0
c[0].h = 32640
c[0].i = 0
c[0].j = 0
c[0].k = 0
c[0].l = 0

etc.

Forget portability for a moment and assume you care about one CPU, one compiler, and one runtime environment. Why can't VC++ pack this structure into 7 bytes? Is it a word-length thing? The MSDN docs on #pragma pack says "the alignment of a member will be on a boundary that is either a multiple of n [1 in my case] or a multiple of the size of the member, whichever is smaller." Can anyone give me some idea of why I get a sizeof 8 and not 7?

+1  A: 

Bitfields are stored in the type that you define. Since you are using unsigned int, and it won't fit in a single unsigned int then the compiler must use a second integer and store the last 24 bits in that last integer.

David Rodríguez - dribeas
+1  A: 

Well you are using unsigned int which happens to be 32 Bit in this case. The next boundary (to fit in the bitfield) for unsigned int is 64 Bit => 8 Bytes.

Vinzenz
A: 

pst is right. The members are aligned on 1-byte boundaries, (or smaller, since it's a bitfield). The overall structure has size 8, and is aligned on an 8-byte boundary. This complies with both the standard and the pack option. The docs never say there will be no padding at the end.

Matthew Flaschen
The issue is not with padding at the end, but with the fact that bitfields are packed inside units of the bitfield type. The structure has no padding at all, only two `unsigned int` members.
David Rodríguez - dribeas
@David, the standard says (§6.7.2.1), "A bit-field is interpreted as a signed or unsigned integer type consisting of the specifiednumber of bits. [...] An implementation may allocate any addressable storage unit large enough to hold a bit-field." So I don't think `unsigned int` means it must actually use `unsigned int` as the storage unit, just some unsigned type with enough bits. Also, bit-fields are allowed to cross storage unit boundaries. So I do think there's padding at the end.
Matthew Flaschen
BTW, what standard are you looking? Neither the current c++ standard nor the C++0x FCD have section 6.7.2.1.
David Rodríguez - dribeas
That part of the standard is at least confusing to me. My rationale for that argument is that §9.6/1 contains: *The constant-expression [number of bits] may be larger than the number of bits in the object representation (3.9) of the bit-field’s type; in such cases the extra bits are used as padding bits and do not participate in the value representation (3.9) of the bitfield.*
David Rodríguez - dribeas
That means that `unsigned char bits : 10` will not be able to store a 10 bit integer (in a 8bit/char machine), and will be different than `unsigned short int bits : 10`, (assuming 16bit short int). From that I have inferred --I cannot point to the standard--, that if the compiler is not able to bump up to a bigger representation type, it is probably not allowed to go down either. Also it would have to break the representation type from `int` to `short + char` in that particular case (again assuming 8bit char, 16bit short, 32bit int)
David Rodríguez - dribeas
@David, sorry, that is from C99. I haven't heard that C++ is different in this area, but it could be.
Matthew Flaschen
@David, however, C++0x §9.6 says, "Bit-fields are packed into some addressable allocation unit. [ Note: bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. — end note ]", which sounds quite similar to C99. To my reading, it doesn't say the "allocation unit" (which seems to be the C++ version of "storage unit") has to be an actual `unsigned int` (using the OP's code again).
Matthew Flaschen
@Matthew: No, in the whole section the only reference to the underlaying type is the quote I added above (§9.6/1) that makes it clear that the declared size of the underlying object does have an effect on the semantics of the field (if the declared type is smaller than the number of bits requested, the bits in excess do not form part of the value, but serve as padding) It does not explicitly state how the underlying type is defined, besides the fact that the grammar defines the bit-field declaration as *decl-specifier identifier attribute : constant-expression*
David Rodríguez - dribeas
... with *decl-specifier* being defined as (one of the options) *type-specifier*. Now, with no other reference to *type* available in the text, I can only assume that *underlying type* refers to the *type-specifier* in the *decl-specifier*, which in the example is `unsigned int`. That is why I already mentioned before that this part of the standard does not seem really *clear* to me.
David Rodríguez - dribeas
@David, I didn't mean to say the declared type has no effect. You're clearly right that the padding bits clause is based on declared type. My main point is that the storage unit/allocation unit does not have to be the same as the declared type. I agree the standard could be clearer.
Matthew Flaschen
@Matthew: We then agree to disagree :), and in that the standard is not as clear as possible. At any rate, even if I accepted your rationale, there is no combination of storage types that the compiler can take for the given input that would maintain the semantics and take only 7 bytes of storage without changing the intended behavior as per §9.6/1. So back to square one, it has nothing to do with padding at the end of the structure.
David Rodríguez - dribeas
+2  A: 

MSVC++ always allocates at least a unit of memory that corresponds to the type you used for your bit-field. You used unsigned int, meaning that a unsigned int is allocated initially, and another unsigned int is allocated when the first one is exhausted. There's no way to force MSVC++ to trim the unused portion of the second unsigned int.

Basically, MSVC++ interprets your unsigned int as a way to express the alignment requirements for the entire structure.

Use smaller types for your bit-fields (unsigned short and unsigned char) and regroup the bit-fields so that they fill the allocated unit entirely - that way you should be able to pack things as tightly as possible.

AndreyT
In this case he can't save anything if he needs the 15 and 16 bits, he would rather end up with at least 9 Bytes because he already uses 56 Bits.
Vinzenz
No... With this advice I can indeed pack it into 7 bytes (thank you AndreyT). I have no idea what you mean about 9 bytes.
Rooke
@Rooke: Note that `unsigned char bits : 10` is valid, but probably not what you mean (will only store up to 8 bit values and reserve 2 extra bits for padding). I.e. it is of critical importance to regroup the bitfields so that each bitfield fits in the allocated unit.
David Rodríguez - dribeas
@David Rodríguez - dribeas: Indeed. The solution required reordering the fields.
Rooke
A: 

To give another interesting illustrates what's going on, consider the case where you want to pack a structure that crosses a type boundary. E.g.

struct state {
    unsigned int cost     : 24; 
    unsigned int back     : 21; 
    unsigned int a        :  1; 
    unsigned int b        :  1; 
    unsigned int c        :  1;
};

This structure can't be packed into 6 bytes using MSVC as far as I know. However, we can get the desired packing effect by breaking up the first two fields:

struct state_packed {
    unsigned short cost_1   : 16; 
    unsigned char  cost_2   :  8;
    unsigned short back_1   : 16; 
    unsigned char  back_2   :  5;
    unsigned char  a        :  1; 
    unsigned char  b        :  1; 
    unsigned char  c        :  1; 
};

This can indeed be packed into 6 bytes. However, accessing the original cost field is extremely awkward and ugly. One method is to cast a state_packed pointer to a specialized dummy struct:

struct state_cost {
    unsigned int cost     : 24;
    unsigned int junk     :  8; 
};

state_packed    sc;
state_packed *p_sc = &sc;

sc.a = 1;
(*(struct state_cost *)p_sc).cost = 12345;
sc.b = 1;

If anyone knows a more elegant way of doing this, I would love to know!

Rooke