Do different data types in C such as char, short, int, long, float, double have different memory alignment boundaries? In a 32 bit word aligned byte addressable operating system, how is accessing a char or short different from accessing an int or float? In both cases, does the CPU read a full 32-bit word? What happens when an int is not at the boundary? How is it able to read a char at any memory address?
Yes. On a typical but not universal example:
1 char
2 short
4 int
4 float
8 double
What the CPU does is the business of the CPU and the compiler. On CPUs that constrain, compilers take that into account. On a RISC-y chip, the CPU might have to load 32 bits and shift and mask to get a char.
It depends on the compiler and the way you defined your variables. The default behavior of most compilers is to align variables in such a away, as to yield fastest access on the given platform. Aligned variables get the best performance for you.
However, compilers such as gcc, provide compiler specific directives which can be used to "pack" adjacent variables of different types (and hence sizes), to save on memory at the cost of performance (but that's what you get to decide, by using the packing directive.) See this question.
The CPU may read a full 32-bit word (and maybe more to get the whole cacheline) when reading a char/short.
On many platforms, mis-aligned memory access carries a performance penalty or may even result in program interruption.
Eg on x86, accessing memory through a mis-aligned pointer can result in SIGBUS
being raised if both EFLAGS.AC
and CR0.AM
are set (see this answer).
Lots of questions...
Do different data types in C such as char, short, int, long, float, double have different memory alignment boundaries?
Yes. The exact alignment boundaries are compiler-specific, and some let you change how they pack struct
s. (It's best to insert padding fields so as to not to let it become an issue.)
In a 32 bit word aligned byte addressable operating system, how is accessing a char or short different from accessing an int or float?
Actually, it depends on the architecture. I've seen some that have Byte Enable lines on the bus, and will use those to access just the part of memory that they want. On others, non-I/O memory accesses result in reading or writing entire cache lines.
In both cases, does the CPU read a full 32-bit word?
Not necessarily. With Byte Enables, you don't have to read a full 32-bit word. Byte Enables also let you write individual bytes on a >8-bit architecture without performing a read-modify-write.
What happens when an int is not at the boundary?
Some architectures (e.g. x86, IIRC) will perform multiple accesses and join the parts for you. Others (e.g. PowerPC) will generate a Bus Error or similar exception.
How is it able to read a char at any memory address?
Because addresses are quantized by bytes on your architecture. This is not true of all architectures. DSPs are famous for having word-aligned pointers, i.e. a pointer is a word address, not a byte address. (I had to write a serial port driver for one of these. sizeof(char) == sizeof(short) == 1
== 16 bits. So you have to choose between simple code which wastes half the RAM, and lots of byte pack/unpack code.)
The short answer, as others have pointed out, is the compiler will do what's best for the architecture it's compiling to. It may align them to the native word size. It may not. Here is a sample program demonstrating this point:
#include <iostream>
int main()
{
using namespace std;
char c;
short s;
int i;
cout << "sizeof(char): " << sizeof(char) << endl;
cout << "sizeof(short): " << sizeof(short) << endl;
cout << "sizeof(int): " << sizeof(int) << endl;
cout << "short is " << (int)&s - (int)&c << " bytes away from a char" << endl;
cout << "int is " << (int)&i - (int)&s << " bytes away from a short" << endl;
}
The output:
sizeof(char): 1
sizeof(short): 2
sizeof(int): 4
short is 1 bytes away from a char
int is 4 bytes away from a short
As you can see, it added some padding between the int and the short. It didn't bother with the short. In other cases, the reverse may be true. Optimization rules are complex.
And, a warning: The compiler is smarter than you. Don't play with padding and alignment unless you have a really, really good reason. Just trust that what the compiler is doing is the right thing.
Yes, they do have different memory alignment requirements. In real life a specific type is usually supposed/required to be aligned at the boundary that is the same as the size of the type, although theoretically the concepts of size and alignment have no connection to each other.
In some specific situations the platform might require a piece of data to be aligned to even stricter (greater) boundary than the size of the corresponding data type. This can be required for performance reasons, for example, or for some other platform-specific reasons.
If the data is not aligned, the behavior depends on the platform. On some hardware platforms an attempt to access unaligned data will result in a crash (Sun machines, for example). While on the other hardware platform it might result in a slight loss of efficiency and/or atomicity of access, with no other detrimental effects (Intel x86 machines, for example).
An important detail that is worth to mentioned here is that from the pedantic point of view, for a C program the term platform refers to the environment provided by the compiler, not by the hardware. The compiler is always free to implement an abstraction layer that isolates the C program from the underlying hardware platform, completely (or almost completely) hiding any hardware-imposed requirements. For example, it is possible to make an implementation that will remove any alignment requirements from C program, even when the underlying hardware platform does impose such requirements. However in practice, for efficiency considerations important to C language philosophy, hardware alignment requirements most of the time (if not always) apply to C programs as well.
Short answer: it depends on your compiler and architecture. Most compilers have some sort of command-line option or #pragma
that you can use to manually specify or alter the alignment of variables.
I once used something like this to investigate the data alignment of various types:
union {
struct {
char one;
char two;
char three;
char four;
} chars;
struct {
short one;
short two;
short three;
short four;
} shorts;
struct {
int one;
int two;
int three;
int four;
} ints;
struct {
double one;
double two;
double three;
double four;
} doubles;
/* etc, etc */
} many_types;
By looking at the addresses of each struct member vs the sizeof()
that member, you can get a picture of how your compiler is aligning different data types.
You might care to study the output of this program - compiled for both 32-bit and 64-bit on an Intel Mac running MacOS X 10.6.2.
/*
@(#)File: $RCSfile: typesize.c,v $
@(#)Version: $Revision: 1.7 $
@(#)Last changed: $Date: 2008/12/21 18:25:17 $
@(#)Purpose: Structure sizes/alignments
@(#)Author: J Leffler
@(#)Copyright: (C) JLSS 1990,1997,2004,2007-08
@(#)Product: :PRODUCT:
*/
#include <stdio.h>
#include <time.h>
#include <stddef.h>
#if __STDC_VERSION__ >= 199901L
#include <inttypes.h>
#endif /* __STDC_VERSION__ */
#define SPRINT(x) printf("%2u = sizeof(" #x ")\n", (unsigned int)sizeof(x))
int main(void)
{
/* Basic Types */
SPRINT(char);
SPRINT(unsigned char);
SPRINT(short);
SPRINT(unsigned short);
SPRINT(int);
SPRINT(unsigned int);
SPRINT(long);
SPRINT(unsigned long);
#if __STDC_VERSION__ >= 199901L
SPRINT(long long);
SPRINT(unsigned long long);
SPRINT(uintmax_t);
#endif /* __STDC_VERSION__ */
SPRINT(float);
SPRINT(double);
SPRINT(long double);
SPRINT(size_t);
SPRINT(ptrdiff_t);
SPRINT(time_t);
/* Pointers */
SPRINT(void *);
SPRINT(char *);
SPRINT(short *);
SPRINT(int *);
SPRINT(long *);
SPRINT(float *);
SPRINT(double *);
/* Pointers to functions */
SPRINT(int (*)(void));
SPRINT(double (*)(void));
SPRINT(char *(*)(void));
/* Structures */
SPRINT(struct { char a; });
SPRINT(struct { short a; });
SPRINT(struct { int a; });
SPRINT(struct { long a; });
SPRINT(struct { float a; });
SPRINT(struct { double a; });
SPRINT(struct { char a; double b; });
SPRINT(struct { short a; double b; });
SPRINT(struct { long a; double b; });
SPRINT(struct { char a; char b; short c; });
SPRINT(struct { char a; char b; long c; });
SPRINT(struct { short a; short b; });
SPRINT(struct { char a[3]; char b[3]; });
SPRINT(struct { char a[3]; char b[3]; short c; });
SPRINT(struct { long double a; });
SPRINT(struct { char a; long double b; });
#if __STDC_VERSION__ >= 199901L
SPRINT(struct { char a; long long b; });
#endif /* __STDC_VERSION__ */
return(0);
}
Output from 64-bit compilation:
1 = sizeof(char)
1 = sizeof(unsigned char)
2 = sizeof(short)
2 = sizeof(unsigned short)
4 = sizeof(int)
4 = sizeof(unsigned int)
8 = sizeof(long)
8 = sizeof(unsigned long)
8 = sizeof(long long)
8 = sizeof(unsigned long long)
8 = sizeof(uintmax_t)
4 = sizeof(float)
8 = sizeof(double)
16 = sizeof(long double)
8 = sizeof(size_t)
8 = sizeof(ptrdiff_t)
8 = sizeof(time_t)
8 = sizeof(void *)
8 = sizeof(char *)
8 = sizeof(short *)
8 = sizeof(int *)
8 = sizeof(long *)
8 = sizeof(float *)
8 = sizeof(double *)
8 = sizeof(int (*)(void))
8 = sizeof(double (*)(void))
8 = sizeof(char *(*)(void))
1 = sizeof(struct { char a; })
2 = sizeof(struct { short a; })
4 = sizeof(struct { int a; })
8 = sizeof(struct { long a; })
4 = sizeof(struct { float a; })
8 = sizeof(struct { double a; })
16 = sizeof(struct { char a; double b; })
16 = sizeof(struct { short a; double b; })
16 = sizeof(struct { long a; double b; })
4 = sizeof(struct { char a; char b; short c; })
16 = sizeof(struct { char a; char b; long c; })
4 = sizeof(struct { short a; short b; })
6 = sizeof(struct { char a[3]; char b[3]; })
8 = sizeof(struct { char a[3]; char b[3]; short c; })
16 = sizeof(struct { long double a; })
32 = sizeof(struct { char a; long double b; })
16 = sizeof(struct { char a; long long b; })
Output from 32-bit compilation:
1 = sizeof(char)
1 = sizeof(unsigned char)
2 = sizeof(short)
2 = sizeof(unsigned short)
4 = sizeof(int)
4 = sizeof(unsigned int)
4 = sizeof(long)
4 = sizeof(unsigned long)
8 = sizeof(long long)
8 = sizeof(unsigned long long)
8 = sizeof(uintmax_t)
4 = sizeof(float)
8 = sizeof(double)
16 = sizeof(long double)
4 = sizeof(size_t)
4 = sizeof(ptrdiff_t)
4 = sizeof(time_t)
4 = sizeof(void *)
4 = sizeof(char *)
4 = sizeof(short *)
4 = sizeof(int *)
4 = sizeof(long *)
4 = sizeof(float *)
4 = sizeof(double *)
4 = sizeof(int (*)(void))
4 = sizeof(double (*)(void))
4 = sizeof(char *(*)(void))
1 = sizeof(struct { char a; })
2 = sizeof(struct { short a; })
4 = sizeof(struct { int a; })
4 = sizeof(struct { long a; })
4 = sizeof(struct { float a; })
8 = sizeof(struct { double a; })
12 = sizeof(struct { char a; double b; })
12 = sizeof(struct { short a; double b; })
12 = sizeof(struct { long a; double b; })
4 = sizeof(struct { char a; char b; short c; })
8 = sizeof(struct { char a; char b; long c; })
4 = sizeof(struct { short a; short b; })
6 = sizeof(struct { char a[3]; char b[3]; })
8 = sizeof(struct { char a[3]; char b[3]; short c; })
16 = sizeof(struct { long double a; })
32 = sizeof(struct { char a; long double b; })
12 = sizeof(struct { char a; long long b; })
You can play all sorts of games with the structures. The key point is that the alignment requirements for different types does vary. Depending on the platform, you may have more or less stringent requirements. SPARC is fussy; Intel tends to do more work if you do misaligned access (so it is slow, but works); the old DEC Alpha chips (and I think the MIPS RISC chips) could be switched to behave differently, either more efficiently with always requiring aligned access or less efficiently to mimic what Intel chips do.