views:

3382

answers:

6

Why does the 'sizeof' operator return a size larger for a structure than the total sizes of the structure's members?

+98  A: 

This is because of structure alignment. Structure alignment refers to the ability of the compiler to insert unused memory into a structure so that data members are optimally aligned for better performance. Many processors perform best when fundamental data types are stored at byte-addresses that are multiples of their sizes.

Here's an example using typical settings for an x86 processor:

struct X
{
    short s; /* 2 bytes */
             /* 2 padding bytes */
    int   i; /* 4 bytes */
    char  c; /* 1 byte */
             /* 3 padding bytes */
};

struct Y
{
    int   i; /* 4 bytes */
    char  c; /* 1 byte */
             /* 1 padding byte */
    short s; /* 2 bytes */
};

struct Z
{
    int   i; /* 4 bytes */
    short s; /* 2 bytes */
    char  c; /* 1 byte */
             /* 1 padding byte */
};

const int sizeX = sizeof(X); /* = 12 */
const int sizeY = sizeof(X); /* = 8 */
const int sizeZ = sizeof(X); /* = 8 */

One can minimize the size of structures by putting the largest data types at the beginning of the structure and the smallest data types at the end of the structure (like structure Z in the example above).

IMPORTANT NOTE: Both the C and C++ standards state that structure alignment is implementation defined. Therefore each compiler may choose to align data differently, resulting in different and incompatible data layouts. For this reason, when dealing with libraries that will be used by different compilers, it is important to understand how the compilers align data. Some compilers have command-line settings and/or special #pragma statements to change the structure alignment settings.

Kevin
I want to make a note here: Most processors penalize you for unaligned memory access (as you mentioned), but you can't forget that many completely disallow it. Most MIPS chips, in particular, will throw an exception on an unaligned access.
Cody Brocious
The x86 chips are actually rather unique in that they allow unaligned access, albeit penalized; AFAIK *most* chips will throw exceptions, not just a few. PowerPC is another common example.
Dark Shikari
Enabling pragmas for unaligned accesses generally cause your code to balloon in size, on processors which throw misalignment faults, as code to fix up every misalignment has to be generated. ARM also throws misalignment faults.
Mike Dimmick
@Dark - totally agree. But *most* desktop processors are x86/x64, so *most* chips don't issue data alignment faults ;)
Aaron
There's a typo at the end of the code. Should be sizeof(Y) and sizeof(Z) for the last two lines.
Dara Kong
Unaligned data access is typically a feature found in CISC architectures, and most RISC architectures do not include it (ARM, MIPS, PowerPC, Cell). In actually, *most* chips are NOT desktop processors, for embedded rule by numbers of chips and the vast majority of these are RISC architectures.
Adam K. Johnson
+1  A: 

It can do so if you have implicitly or explicitly set the alignment of the struct. A struct that is aligned 4 will always be a multiple of 4 bytes even if the size of its members would be something that's not a multiple of 4 bytes.

Also a library may be compiled under x86 with 32-bit ints and you may be comparing its components on a 64-bit process would would give you a different result if you were doing this by hand.

Orion Adrian
+10  A: 

Packing and byte alignment... described in the C FAQ here

EmmEff
+2  A: 

This can be due to byte alignment and padding so that the structure comes out to an even number of bytes (or words) on your platform. For example in C on Linux, the following 3 structures:

#include "stdio.h"


struct oneInt {
  int x;
};

struct twoInts {
  int x;
  int y;
};

struct someBits {
  int x:2;
  int y:6;
};


int main (int argc, char** argv) {
  printf("oneInt=%d\n",sizeof(struct oneInt));
  printf("twoInts=%d\n",sizeof(struct twoInts));
  printf("someBits=%d\n",sizeof(struct someBits));
  return 0;
}

Have members who's sizes (in bytes) are 4 bytes (32 bits), 8 bytes (2x 32 bits) and 1 byte (2+6 bits) respectively. The above program (on Linux using gcc) prints the sizes as 4, 8, and 4 - where the last structure is padded so that it is a single word (4 x 8 bit bytes on my 32bit platform).

oneInt=4
twoInts=8
someBits=4
Kyle Burton
+1  A: 

If you want the structure to have a certain size with GCC for example use attribute((packed)).

On Windows you can set the alignment to one byte when using the cl.exe compier with the /Zp option.

Usually it is easier for the Operatin System to access data that is a multiple of 4 (or 8), depending platform and also on the compiler.

So it is a matter of alignment basically.

You need to have good reasons to change it.

Iulian Şerbănoiu
"good reasons" Example: Keeping binary compatibility (padding) consistent between 32-bit and 64-bit systems for a complex struct in proof-of-concept demo code that's being showcased tomorrow. Sometimes necessity has to take precedence over propriety.
Mr.Ree
Everything is ok except when you mention the Operating System. This is an issue for the CPU speed, the OS is not involved at all.
Blaisorblade
Another good reason is if you're stuffing a datastream into a struct, e.g. when parsing network protocols.
ceo
+1  A: 

In addition to the other answers, a struct can (but usually doesn't) have virtual functions, in which case the size of the struct will also include the space for the vtbl.

JohnMcG
Not quite. In typical implementations, what is added to the struct is a vtable *pointer*.
Don Wakefield