tags:

views:

373

answers:

8

Do different data types in C such as char, short, int, long, float, double have different memory alignment boundaries? In a 32 bit word aligned byte addressable operating system, how is accessing a char or short different from accessing an int or float? In both cases, does the CPU read a full 32-bit word? What happens when an int is not at the boundary? How is it able to read a char at any memory address?

A: 

Yes. On a typical but not universal example:

1 char
2 short
4 int
4 float
8 double

What the CPU does is the business of the CPU and the compiler. On CPUs that constrain, compilers take that into account. On a RISC-y chip, the CPU might have to load 32 bits and shift and mask to get a char.

bmargulies
On a RISC CPU you might get an exception if you try to read a 16-bit value at an odd address.
Robert
Even on a CISC CPU you might get an exception if you try to read a 16-bit value from an odd address (e.g. the 68K had restrictions on this general order).
Jerry Coffin
right. It's the compiler's job not to do that.
bmargulies
Note that these are common, but not required. The C standard mandates that `sizeof(char) == 1` and at least eight bits, `short` and `int` at least sixteen bits, `long` at least thirty-two, and things like that. Actual length and alignment aside from that is up to the implementation.
David Thornley
TI TMS320C5510 DSP: 1 char, 1 short, 2 int, 2 float, 4 double. Sweet dreams. ^_-
Mike DeSimone
+5  A: 

It depends on the compiler and the way you defined your variables. The default behavior of most compilers is to align variables in such a away, as to yield fastest access on the given platform. Aligned variables get the best performance for you.

However, compilers such as gcc, provide compiler specific directives which can be used to "pack" adjacent variables of different types (and hence sizes), to save on memory at the cost of performance (but that's what you get to decide, by using the packing directive.) See this question.

The CPU may read a full 32-bit word (and maybe more to get the whole cacheline) when reading a char/short.

Sudhanshu
The CPU does not necessarily read a full 32-bit word when reading a char/short. Just for example, most fairly recent Intel CPUs only really do physical addressing down to the point of 64-bit words, and then there's a set of lines that say whether to read individual bytes within that 64-bit word. Of course, in the usual case, it's really going to read enough to fill an entire cache line.
Jerry Coffin
+1 @Jerry: Yeah the cacheline bit is correct. I didn't know about the set of lines to address individual bytes though. Corrected. :)
Sudhanshu
A: 

On many platforms, mis-aligned memory access carries a performance penalty or may even result in program interruption.

Eg on x86, accessing memory through a mis-aligned pointer can result in SIGBUS being raised if both EFLAGS.AC and CR0.AM are set (see this answer).

Christoph
+3  A: 

Lots of questions...

Do different data types in C such as char, short, int, long, float, double have different memory alignment boundaries?

Yes. The exact alignment boundaries are compiler-specific, and some let you change how they pack structs. (It's best to insert padding fields so as to not to let it become an issue.)

In a 32 bit word aligned byte addressable operating system, how is accessing a char or short different from accessing an int or float?

Actually, it depends on the architecture. I've seen some that have Byte Enable lines on the bus, and will use those to access just the part of memory that they want. On others, non-I/O memory accesses result in reading or writing entire cache lines.

In both cases, does the CPU read a full 32-bit word?

Not necessarily. With Byte Enables, you don't have to read a full 32-bit word. Byte Enables also let you write individual bytes on a >8-bit architecture without performing a read-modify-write.

What happens when an int is not at the boundary?

Some architectures (e.g. x86, IIRC) will perform multiple accesses and join the parts for you. Others (e.g. PowerPC) will generate a Bus Error or similar exception.

How is it able to read a char at any memory address?

Because addresses are quantized by bytes on your architecture. This is not true of all architectures. DSPs are famous for having word-aligned pointers, i.e. a pointer is a word address, not a byte address. (I had to write a serial port driver for one of these. sizeof(char) == sizeof(short) == 1 == 16 bits. So you have to choose between simple code which wastes half the RAM, and lots of byte pack/unpack code.)

Mike DeSimone
+2  A: 

The short answer, as others have pointed out, is the compiler will do what's best for the architecture it's compiling to. It may align them to the native word size. It may not. Here is a sample program demonstrating this point:

#include <iostream>

int main()
{
    using namespace std;

    char c;
    short s;
    int i;

    cout << "sizeof(char): " << sizeof(char) << endl;
    cout << "sizeof(short): " << sizeof(short) << endl;
    cout << "sizeof(int): " << sizeof(int) << endl;

    cout << "short is " << (int)&s - (int)&c << " bytes away from a char" << endl;
    cout << "int is " << (int)&i - (int)&s << " bytes away from a short" << endl;
}

The output:

sizeof(char): 1
sizeof(short): 2
sizeof(int): 4
short is 1 bytes away from a char
int is 4 bytes away from a short

As you can see, it added some padding between the int and the short. It didn't bother with the short. In other cases, the reverse may be true. Optimization rules are complex.

And, a warning: The compiler is smarter than you. Don't play with padding and alignment unless you have a really, really good reason. Just trust that what the compiler is doing is the right thing.

Terry Mahaffey
+1. Nice test program. I think you don't see a pad after short, because `c` and `s` together are pushed into a 32-bit memory location, hence it does not need an extra alignment here. Were you to remove `char c` declaration, there probably would be an alignment for `s`
Alexander Pogrebnyak
A: 

Yes, they do have different memory alignment requirements. In real life a specific type is usually supposed/required to be aligned at the boundary that is the same as the size of the type, although theoretically the concepts of size and alignment have no connection to each other.

In some specific situations the platform might require a piece of data to be aligned to even stricter (greater) boundary than the size of the corresponding data type. This can be required for performance reasons, for example, or for some other platform-specific reasons.

If the data is not aligned, the behavior depends on the platform. On some hardware platforms an attempt to access unaligned data will result in a crash (Sun machines, for example). While on the other hardware platform it might result in a slight loss of efficiency and/or atomicity of access, with no other detrimental effects (Intel x86 machines, for example).

An important detail that is worth to mentioned here is that from the pedantic point of view, for a C program the term platform refers to the environment provided by the compiler, not by the hardware. The compiler is always free to implement an abstraction layer that isolates the C program from the underlying hardware platform, completely (or almost completely) hiding any hardware-imposed requirements. For example, it is possible to make an implementation that will remove any alignment requirements from C program, even when the underlying hardware platform does impose such requirements. However in practice, for efficiency considerations important to C language philosophy, hardware alignment requirements most of the time (if not always) apply to C programs as well.

AndreyT
A: 

Short answer: it depends on your compiler and architecture. Most compilers have some sort of command-line option or #pragma that you can use to manually specify or alter the alignment of variables.

I once used something like this to investigate the data alignment of various types:

union {
  struct {
    char one;
    char two;
    char three;
    char four;
  } chars;
  struct {
    short one;
    short two;
    short three;
    short four;
  } shorts;
  struct {
    int one;
    int two;
    int three;
    int four;
  } ints;
  struct {
    double one;
    double two;
    double three;
    double four;
  } doubles;
  /* etc, etc */
} many_types;

By looking at the addresses of each struct member vs the sizeof() that member, you can get a picture of how your compiler is aligning different data types.

bta
A: 

You might care to study the output of this program - compiled for both 32-bit and 64-bit on an Intel Mac running MacOS X 10.6.2.

/*
@(#)File:           $RCSfile: typesize.c,v $
@(#)Version:        $Revision: 1.7 $
@(#)Last changed:   $Date: 2008/12/21 18:25:17 $
@(#)Purpose:        Structure sizes/alignments
@(#)Author:         J Leffler
@(#)Copyright:      (C) JLSS 1990,1997,2004,2007-08
@(#)Product:        :PRODUCT:
*/

#include <stdio.h>
#include <time.h>
#include <stddef.h>
#if __STDC_VERSION__ >= 199901L
#include <inttypes.h>
#endif /* __STDC_VERSION__ */

#define SPRINT(x)   printf("%2u = sizeof(" #x ")\n", (unsigned int)sizeof(x))

int main(void)
{
    /* Basic Types */
    SPRINT(char);
    SPRINT(unsigned char);
    SPRINT(short);
    SPRINT(unsigned short);
    SPRINT(int);
    SPRINT(unsigned int);
    SPRINT(long);
    SPRINT(unsigned long);
#if __STDC_VERSION__ >= 199901L
    SPRINT(long long);
    SPRINT(unsigned long long);
    SPRINT(uintmax_t);
#endif /* __STDC_VERSION__ */
    SPRINT(float);
    SPRINT(double);
    SPRINT(long double);
    SPRINT(size_t);
    SPRINT(ptrdiff_t);
    SPRINT(time_t);

    /* Pointers */
    SPRINT(void *);
    SPRINT(char *);
    SPRINT(short *);
    SPRINT(int *);
    SPRINT(long *);
    SPRINT(float *);
    SPRINT(double *);

    /* Pointers to functions */
    SPRINT(int (*)(void));
    SPRINT(double (*)(void));
    SPRINT(char *(*)(void));

    /* Structures */
    SPRINT(struct { char a; });
    SPRINT(struct { short a; });
    SPRINT(struct { int a; });
    SPRINT(struct { long a; });
    SPRINT(struct { float a; });
    SPRINT(struct { double a; });
    SPRINT(struct { char a; double b; });
    SPRINT(struct { short a; double b; });
    SPRINT(struct { long a; double b; });
    SPRINT(struct { char a; char b; short c; });
    SPRINT(struct { char a; char b; long c; });
    SPRINT(struct { short a; short b; });
    SPRINT(struct { char a[3]; char b[3]; });
    SPRINT(struct { char a[3]; char b[3]; short c; });
    SPRINT(struct { long double a; });
    SPRINT(struct { char a; long double b; });
#if __STDC_VERSION__ >= 199901L
    SPRINT(struct { char a; long long b; });
#endif /* __STDC_VERSION__ */

    return(0);
}

Output from 64-bit compilation:

 1 = sizeof(char)
 1 = sizeof(unsigned char)
 2 = sizeof(short)
 2 = sizeof(unsigned short)
 4 = sizeof(int)
 4 = sizeof(unsigned int)
 8 = sizeof(long)
 8 = sizeof(unsigned long)
 8 = sizeof(long long)
 8 = sizeof(unsigned long long)
 8 = sizeof(uintmax_t)
 4 = sizeof(float)
 8 = sizeof(double)
16 = sizeof(long double)
 8 = sizeof(size_t)
 8 = sizeof(ptrdiff_t)
 8 = sizeof(time_t)
 8 = sizeof(void *)
 8 = sizeof(char *)
 8 = sizeof(short *)
 8 = sizeof(int *)
 8 = sizeof(long *)
 8 = sizeof(float *)
 8 = sizeof(double *)
 8 = sizeof(int (*)(void))
 8 = sizeof(double (*)(void))
 8 = sizeof(char *(*)(void))
 1 = sizeof(struct { char a; })
 2 = sizeof(struct { short a; })
 4 = sizeof(struct { int a; })
 8 = sizeof(struct { long a; })
 4 = sizeof(struct { float a; })
 8 = sizeof(struct { double a; })
16 = sizeof(struct { char a; double b; })
16 = sizeof(struct { short a; double b; })
16 = sizeof(struct { long a; double b; })
 4 = sizeof(struct { char a; char b; short c; })
16 = sizeof(struct { char a; char b; long c; })
 4 = sizeof(struct { short a; short b; })
 6 = sizeof(struct { char a[3]; char b[3]; })
 8 = sizeof(struct { char a[3]; char b[3]; short c; })
16 = sizeof(struct { long double a; })
32 = sizeof(struct { char a; long double b; })
16 = sizeof(struct { char a; long long b; })

Output from 32-bit compilation:

 1 = sizeof(char)
 1 = sizeof(unsigned char)
 2 = sizeof(short)
 2 = sizeof(unsigned short)
 4 = sizeof(int)
 4 = sizeof(unsigned int)
 4 = sizeof(long)
 4 = sizeof(unsigned long)
 8 = sizeof(long long)
 8 = sizeof(unsigned long long)
 8 = sizeof(uintmax_t)
 4 = sizeof(float)
 8 = sizeof(double)
16 = sizeof(long double)
 4 = sizeof(size_t)
 4 = sizeof(ptrdiff_t)
 4 = sizeof(time_t)
 4 = sizeof(void *)
 4 = sizeof(char *)
 4 = sizeof(short *)
 4 = sizeof(int *)
 4 = sizeof(long *)
 4 = sizeof(float *)
 4 = sizeof(double *)
 4 = sizeof(int (*)(void))
 4 = sizeof(double (*)(void))
 4 = sizeof(char *(*)(void))
 1 = sizeof(struct { char a; })
 2 = sizeof(struct { short a; })
 4 = sizeof(struct { int a; })
 4 = sizeof(struct { long a; })
 4 = sizeof(struct { float a; })
 8 = sizeof(struct { double a; })
12 = sizeof(struct { char a; double b; })
12 = sizeof(struct { short a; double b; })
12 = sizeof(struct { long a; double b; })
 4 = sizeof(struct { char a; char b; short c; })
 8 = sizeof(struct { char a; char b; long c; })
 4 = sizeof(struct { short a; short b; })
 6 = sizeof(struct { char a[3]; char b[3]; })
 8 = sizeof(struct { char a[3]; char b[3]; short c; })
16 = sizeof(struct { long double a; })
32 = sizeof(struct { char a; long double b; })
12 = sizeof(struct { char a; long long b; })

You can play all sorts of games with the structures. The key point is that the alignment requirements for different types does vary. Depending on the platform, you may have more or less stringent requirements. SPARC is fussy; Intel tends to do more work if you do misaligned access (so it is slow, but works); the old DEC Alpha chips (and I think the MIPS RISC chips) could be switched to behave differently, either more efficiently with always requiring aligned access or less efficiently to mimic what Intel chips do.

Jonathan Leffler