views:

860

answers:

6

Admittedly I don't get it. Say you have a memory with a memory word of length of 1 byte. Why can't you access a 4 byte long variable in a single memory access on an unaligned address(i.e. not divisible by 4), as it's the case with aligned addresses?

+5  A: 

It's a limitation of many underlying processors. It can usually be worked around by doing 4 inefficient single byte fetches rather than one efficient word fetch, but many language specifiers decided it would be easier just to outlaw them and force everything to be aligned.

There is much more information in this link that the OP discovered.

Paul Tomblin
A: 

On PowerPC you can load an integer from an odd address with know problems.

Sparc and I86 and (I think) Itatnium raise hardware exceptions when you try this.

One 32 bit load vs four 8 bit loads isnt going to make a lot of difference on most modern processors. Whether the data is already in cache or not will have a far greater effect.

James Anderson
+1  A: 

you can with some processors (the nehalem can do this), but previously all memory access was aligned on a 64-bit (or 32-bit) line, because the bus is 64 bits wide, you had to fetch 64 bit at a time, and it was significantly easier to fetch these in aligned 'chunks' of 64 bits.

So, if you wanted to get a single byte, you fetched the 64-bit chunk and then masked off the bits you didn't want. Easy and fast if your byte was at the right end, but if it was in the middle of that 64-bit chunk, you'd have to mask off the unwanted bits and then shift the data over to the right place. Worse, if you wanted a 2 byte variable, but that was split across 2 chunks, then that required double the required memory accesses.

So, as everyone thinks memory is cheap, they just made the compiler align the data on the processor's chunk sizes so your code runs faster and more efficiently at the cost of wasted memory.

gbjbaanb
+5  A: 

After doing some additional Googling i found this great link, that explains the problem really well.

Daniel
This doesn't really describe why a processor requires memory alignment, it merely affirms that processors see aligned addresses and the performance penalties/advantages to adhering to these boundaries.
joshperry
@Daniel: +1 thanks for the excellent link!
Lazer
A: 

This has to do with cache, system bus, and supported transaction types.

n-alexander
+4  A: 

The actual memory subsystem on a modern processor is restricted to a certain memory access granularity for a number of reasons.

One is speed: modern processors have multiple levels of cache memory that data must be pulled through, supporting a single byte read would just insanely limit performance of the processor (think PIO mode for hard drives)

Another is size: imagine that your processor uses 32-bit addresses, if the processor can assume that the 2 LSB are always 0 then it can access 4 times more memory or the same amount of memory with two bits usable as flags or something else in the internal processor data structures. Taking the two LSB off of an address would give you a 4 byte alignment as each time you increment the address you would really be changing bit 3 which counts by 4's.

When you do an unaligned address read the processor is always going to provide it's granularity worth of data so it actually has to read both blocks, shift off the unwanted bits and save the result to a register for your program to access.

It is really a lot more complex and involved than this inside the processor, if you are curious about how an x86 actually addresses memory take a look at this article.

There are a number of benefits to adhering to memory alignment that you can read at this IBM article.

Another performance alignment you may want to take advantage of is alignment on Cache lines which are 64K, but that's a topic for another question!

joshperry