views:

1529

answers:

9

I have a C++ program representing a TCP header as a struct:

#include "stdafx.h"

/*  TCP HEADER

    0                   1                   2                   3   
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Source Port          |       Destination Port        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Sequence Number                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Acknowledgment Number                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Data |           |U|A|P|R|S|F|                               |
   | Offset| Reserved  |R|C|S|S|Y|I|            Window             |
   |       |           |G|K|H|T|N|N|                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Checksum            |         Urgent Pointer        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             data                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

*/

typedef struct {     // RFC793
    WORD   wSourcePort;
    WORD   wDestPort;
    DWORD   dwSequence;
    DWORD   dwAcknowledgment;
    unsigned int byReserved1:4;
    unsigned int byDataOffset:4;
    unsigned int fFIN:1;
    unsigned int fSYN:1;
    unsigned int fRST:1;
    unsigned int fPSH:1;
    unsigned int fACK:1;
    unsigned int fURG:1;
    unsigned int byReserved2:2;
    unsigned short wWindow;
    WORD   wChecksum;
    WORD   wUrgentPointer;
} TCP_HEADER, *PTCP_HEADER;


int _tmain(int argc, _TCHAR* argv[])
{
    printf("TCP header length: %d\n", sizeof(TCP_HEADER));
    return 0;
}

If I run this program I get the size of this header as 24 bytes, which is not the size I was expecting. If I change the type of the field "wWindow" to "unsigned int wWindow:16", which has the same number of bits as an unsigned short, the program tells me the size of the struct is now 20 bytes, the correct size. Why is this?

I am using Microsoft Visual Studio 2005 with SP1 on a 32-bit x86 machine.

+6  A: 

Because the compiler is packing your bitfield into a 32-bit int, not a 16-bit entity.

In general you should avoid bitfields and use other manifest constants (enums or whatever) with explicit bit masking and shifting to access the 'sub-fields' in a field.

Here's one reason why bitfields should be avoided - they aren't very portable between compilers even for the same platform. from the C99 standard (there's similar wording in the C90 standard):

An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

You cannot guarantee whether a bit field will 'span' an int boundary or not and you can't specify whether a bitfield starts at the low-end of the int or the high end of the int (this is independant of whether the processor is big-endian or little-endian).

Michael Burr
A: 

The compiler is padding the non-bitfield struct member to 32-bit -- native word alignment. To fix this, do #pragma pack(0) before the struct and #pragma pack() after.

Cody Brocious
#pragma pack (0) didn't change the behaviour.
A: 

Struct boundaries in memory can be padded by the compiler depending on the size and order of fields.

Scottie T
A: 

Not a C/C++ expert when it comes to packing. But I imagine there is a rule in the spec which says that when a non-bitfield follows a bitfield it must be aligned on the word boundary regardless of whether or not it fits in the remaining space. By making it an explicit bitvector you are avoiding this problem.

Again this is speculation with a touch of experience.

JaredPar
+2  A: 

See this question: http://stackoverflow.com/questions/119123/why-does-the-sizeof-operator-return-a-size-larger-for-a-structure-than-the-tota .

I believe that compiler takes a hint to disable padding when you use the "unsigned int wWindow:16" syntax.

Also, note that a short is not guaranteed to be 16 bits. The guarantee is that: 16 bits <= size of a short <= size of an int.

andy
@andy: +1, could include #pragma push/pop on the pack parameter to help him out.
sixlettervariables
ChrisN
A: 

Interesting - I would think that "WORD" would evaluate to "unsigned short", so you'd have that problem in more than one place.

Also be aware that you'll need to deal with endian issues in any value over 8 bits.

Mark Ransom
A: 

You are seeing different values because of compiler packing rules. You can see rules specific to visual studio here.

When you have a structure that must be packed (or adhere to some specific alignment requiremnts), you should use the #pragma pack() option. For your code, you can use #pragma pack(0) which will align all structure members on byte boundaries. You can then use #pragma pack() to reset structure packing to it's default state. You can see more information on the pack pragma here.

Mark
+4  A: 

Your series of "unsigned int:xx" bitfields use up only 16 of the 32 bits in an int. The other 16 bits (2 bytes) are there, but unused. This is followed by the unsigned short, which is on an int boundary, and then a WORD, which is along aligned on an int boundary which means that there 2 bytes of padding between them.

When you switch to "unsigned int wWindow:16", instead of being a separate short, the compiler uses the unused parts of the previous bitfield, so no waste, no short, and no padding after the short, hence you save four bytes.

James Curran
A: 

I think Mike B got it right, but but not perfectly clear. When you ask for "short", it's aligned on 32bit boundary. When you ask for int:16, it's not. So int:16 fits right after th ebit fields, while short skips 2 bytes and starts at the next 32-bit block.

The rest of what he is saying is perfectly applicable - the bit field must never be used to code an externally-visible structure, because there are no guarantee as to how they are allocated. At best, they belong in embedded programs where saving a byte is important. And even there, you can't use them to actually controly bits in memory-mapped ports.

Arkadiy