views:

1869

answers:

10

I recently got thinking about alignment... It's something that we don't ordinarily have to consider, but I've realized that some processors require objects to be aligned along 4-byte boundaries. What exactly does this mean, and which specific systems have alignment requirements?

Suppose I have an arbitrary pointer:

unsigned char* ptr

Now, I'm trying to retrieve a double value from a memory location:

double d = **((double*)ptr);

Is this going to cause problems?

A: 

An example of aligment requirement is when using vectorization (SIMD) instructions. (It can be used without aligment but is much faster if you use a kind of instruction which requires alignment).

Artur Soler
+10  A: 

It can definitely cause problems on some systems.

For example, on ARM-based systems you cannot address a 32-bit word that is not aligned to a 4-byte boundary. Doing so will result in an access violation exception. On x86 you can access such non-aligned data, though the performance suffers a little since two words have to fetched from memory instead of just one.

laalto
Some ARM systems even silently access the corresponding aligned address where the lower bits are zero, which can lead to hard to find bugs.
starblue
It definitiely is a problem on ARM if arbitrary byte locations are used as laalto and starblue point out. But memory blocks allocated will always have a sufficient (i.e. 16 byte) alignment, even if they are used for character arrays. Also watch out for MSB/LSB when crossing platforms with this technique.
Adriaan
+1  A: 

Alignment affects the layout of structs. Consider this struct:

struct S {
  char a;
  long b;
};

On a 32-bit CPU the layout of this struct will often be:

a _ _ _ b b b b

The requirement is that a 32-bit value has to be aligned on a 32-bit boundary. If the struct is changed like this:

struct S {
  char a;
  short b;
  long c;
};

the layout will be this:

a _ b b c c c c

The 16-bit value is aligned on a 16-bit boundary.

Sometimes you want to pack the structs perhaps if you want to match the struct with a data format. By using a compiler option or perhaps a #pragma you are able to remove the excess space:

a b b b b
a b b c c c c

However, accessing an unaligned member of a packed struct will often be much slower on modern CPU's, or may even result in an exception.

Martin Liversage
For good cross-platform programming you probably would not want to "match the struct with a data format". Unless the data format has been designed so all members are aligned (e.g. TCP/IP protocols, so I've heard), but then you still have endianness issues.
Craig McQueen
+2  A: 

Yes, that could cause problems.

4-alignment simply means that the pointer, when considered as a numeric address, is a multiple of 4. If the pointer is not a multiple of the required alignment, then it is unaligned. There are two reasons why compilers place alignment restrictions on certain types:

  1. Because the hardware cannot load that datatype from an unaligned pointer (at least, not using the instructions which the compiler wants to emit for loads and stores).
  2. Because the hardware loads that datatype more quickly from aligned pointers.

If you're in case (1), and double is 4-aligned, and you try your code with a char * pointer which is not 4-aligned, then you'll most likely get a hardware trap. Some hardware does not trap. It just loads a nonsense value and continues. However, the C++ standard doesn't define what can happen (undefined behavior), so this code could set your computer on fire.

On x86, you're never in case (1), because the standard load instructions can handle unaligned pointers. On ARM, there are no unaligned loads, and if you attempt one then your program crashes (if you're lucky. Some ARMs silently fail).

Coming back to your example, the question is why you're trying this with a char * that isn't 4-aligned. If you successfully wrote a double there via a double *, then you'll be able to read it back. So if you originally had a "proper" pointer to double, which you cast to char * and you're now casting back, you don't have to worry about alignment.

But you said arbitrary char *, so I guess that's not what you have. If you read a chunk of data out of a file, which contains a serialized double, then you must ensure that that the alignment requirements for your platform are met in order to do this cast. If you have 8 bytes representing a double in some file format, then you cannot just read it willy-nilly into a char* buffer at any offset and then cast to double *.

The easiest way to do this is to make sure that you read the file data into a suitable struct. You're also helped by the fact that memory allocations are always aligned to the maximum alignment requirement of any type they're big enough to contain. So if you allocate a buffer big enough to contain a double, then the start of that buffer has whatever alignment is required by double. So then you can read the 8 bytes representing the double into the start of the buffer, cast (or use a union) and read the double out.

Alternatively, you could do something like this:

double readUnalignedDouble(char *un_ptr) {
    double d;
    // either of these
    std::memcpy(&d, un_ptr, sizeof(d));
    std::copy(un_ptr, un_ptr + sizeof(d), reinterpret_cast<char *>(&d));
    return d;
}

This is guaranteed to be valid (assuming un_ptr really points to the bytes of a valid double representation for your platform), because double is POD and hence can be copied byte-by-byte. It may not be the fastest solution, if you have a lot of doubles to load.

If you are reading from a file, there's actually a bit more to it than that if you're worried about platforms with non-IEEE double representations, or with 9 bit bytes, or some other unusual properties, where there might be non-value bits in the stored representation of a double. But you didn't actually ask about files, I just made it up as an example, and in any case those platforms are much rarer than the issue you're asking about, which is for double to have an alignment requirement.

Finally, nothing at all to do with alignment, you also have strict aliasing to worry about if you got that char * via a cast from a pointer which is not alias-compatible with double *. Aliasing is valid between char * itself and anything else, though.

Steve Jessop
char* ptr will be aligned, since it is a pointer.
Ryan Fox
Consider for example, `char *p = new char[100]; char *ptr = p + 1;` ptr is now unaligned if double is 4-aligned. Casting ptr to `double *` then reading a double is undefined behaviour (even if you've set `p[1]` through `p[sizeof(double)]` to 0).
Steve Jessop
+1  A: 

On the x86 it's always going to run, of course more efficiently when aligned.

But if you're MULTITHREADING then watch for read-write-tearing. With a 64-bit value you need an x64 machine to give you atomic read-and-write between threads.
If say you read the value from another thread when it's say incrementing between 0x00000000.FFFFFFFF and 0x00000001.00000000, then another thread might in theory read say either 0 or 1FFFFFFFF, especially IF SAY the value STRADDLED A CACHE-LINE boundary.
I recommend Duffy's "Concurrent Programming on Windows" for its nice discussion of memory models, even mentioning alignment gotchas on multiprocessors when dot-net does a GC. You want to stay away from the Itanium !

pngaz
Aligned 64-bit values can be atomically accessed on 32-bit x86 with CMPXCHG8B, actually. There's a corresponding 128-bit CMPXCHG16B on 64-bit, too.
bdonlan
+1  A: 

Here's what the Intel x86/x64 Reference Manual says about alignments:

4.1.1 Alignment of Words, Doublewords, Quadwords, and Double Quadwords

Words, doublewords, and quadwords do not need to be aligned in memory on natural boundaries. The natural boundaries for words, double words, and quadwords are even-numbered addresses, addresses evenly divisible by four, and addresses evenly divisible by eight, respectively. However, to improve the performance of programs, data structures (especially stacks) should be aligned on natural boundaries whenever possible. The reason for this is that the processor requires two memory accesses to make an unaligned memory access; aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access.

Some instructions that operate on double quadwords require memory operands to be aligned on a natural boundary. These instructions generate a general-protection exception (#GP) if an unaligned operand is specified. A natural boundary for a double quadword is any address evenly divisible by 16. Other instructions that operate on double quadwords permit unaligned access (without generating a general-protection exception). However, additional memory bus cycles are required to access unaligned data from memory.

Don't forget, reference manuals are the ultimate source of information of the responsible developer and engineer, so if you're dealing with something well documented such as Intel CPUs, just look up what the reference manual says about the issue.

DrJokepu
If you're writing code for x86 only.
Steve Jessop
@onebyone: true, but other architectures have their own reference manuals as well.
DrJokepu
Yes, I just mean that sometimes you want to write code which is not for any particular architecture (in fact, that happens to have been the usual case for me so far). In that situation, CPU reference manuals don't help, you can only rely on the C++ standard.
Steve Jessop
+1  A: 

Yes, that can cause a number of problems. The C++ standard doesn't actually guarantee that it'll work. You can't just arbitrarily cast between pointer types.

When you cast a char pointer to a double pointer, it uses a reinterpret_cast, which applies an implementation-defined mapping. You're not guaranteed that the resulting pointer will contain the same bit pattern, or that it will point to the same address or, well, anything else. In more practical terms, you're also not guaranteed that the value you're reading is aligned properly. If the data was written as a series of chars, then they will use char's alignment requirements.

As for what alignment means, essentially just that the starting address of the value should be divisible by the alignment size. Address 16 is aligned on 1, 2, 4, 8 and 16-byte boundaries, for example, so on typical CPU's, values of these sizes can be stored there.

Address 6 isn't aligned on a 4-byte boundary, so we should not store 4-byte values there.

It's worth noting that even on CPU's that don't enforce or require alignment, you typically still get a significant slowdown from accessing unaligned values.

jalf
A: 

Enforced memory alignment is much more common in RISC based architectures such as MIPS.
The main thinking for these types of processors, AFAIK, is really a speed issue.
RISC methodology was all about having a set of simple and fast instructions ( usually one memory cycle per instruction ). This does not mean necessarily that it has less instructions than a CISC processor, more that it has simpler, faster instructions.
Many MIPS processors, although 8 byte addressable would be word aligned ( 32-bits typically but not always) then mask off the appropriate bits.
The idea being that this is faster to do an aligned load + bit mask than than trying to do an unaligned load. Typically ( and of course this really depends on chipset ), doing an un-aligned load would generate a bus error so RISC processors would offer an 'unaligned load/store' instruction but this would often be much slower than the corresponding aligned load/store.

Of course this still doesn't answer the question as to why they do this i.e what advantage does having memory word aligned give you? I'm no hardware expert and I'm sure someone on here can give a better answer but my two best guesses are:
1. It can be much faster to fetch from the cache when word aligned because many caches are organised into cache-lines ( anything from 8 to 512 bytes ) and as cache memory is typically much more expensive than RAM, you want to make the most of it.
2. It may be much faster to access each memory address as it allows you to read through 'Burst Mode' ( i.e fetching the next sequential address before it's needed )

Note none of the above is strictly impossible with non-aligned stores, I'm guessing ( though I don't know ) that a lot of it comes down to hardware design choices and cost

zebrabox
+2  A: 

The easy answer is to use the dynamic memory allocation to allocate ptr.
If you allocate memory via ::operator new then the standard provides some guarantees about alignment.

18.5.1.1 Single-object forms [new.delete.single]
void* operator new(std::size_t size) throw(std::bad_alloc);

1 Effects: The allocation function (3.7.3.1) called by a new-expression (5.3.4) to allocate size bytes of storage suitably aligned to represent any object of that size.

18.5.1.2 Array forms [new.delete.array]
void* operator new[](std::size_t size) throw(std::bad_alloc);

1 Effects: The allocation function (3.7.3.1) called by the array form of a new-expression (5.3.4) to allocate size bytes of storage suitably aligned to represent any array object of that size or smaller.222)

Then the section on alignment:

3.11.5 Alignments have an order from weaker to stronger or stricter alignments. Stricter alignments have larger alignment values. An address that satisfies an alignment requirement also satisfies any weaker valid alignment requirement.

So when you allocate memory for char* just make sure that it is larger than the size of a double and the standard allocation routines will make sure that it is aligned correctly for an object of your size.

std::vector<char>    data(sizeof(double) * 5);
                     // assuming sizeof(double) == 4
                     // This will allocate enough space for 20 bytes and the memory will
                     // be aligned so that any type of size 20 can use it.
                     // Because of alignment rules this means any object smaller than
                     // 20 will also be correctly aligned.
char*                p  = &data[0];
double*              x  = reinterpret_cast<double*>(p);
double               d0 = x[0];
double               d1 = x[1];
double               d2 = x[2];
double               d3 = x[3];
double               d4 = x[4];

Note this does not hold for static objects.
So the following is not guaranteed to work:

char  data[sizeof(double) * 5];  // No guarantee on alignment.

Note: Why std::vector works is documented here:
http://stackoverflow.com/questions/1229433/manual-invocation-of-constructor/1229589#1229589

Martin York
+1  A: 

SPARC (Solaris machines) is another architecture (at least some in times past) that will choke (give a SIGBUS error) if you try to use an unaligned value.

An addendum to Martin York, malloc also is aligned to the largest possible type, ie it's safe for everything, like 'new'. In fact, frequently 'new' just uses malloc.

JDonner