tags:

views:

231

answers:

6

Dear All,

Pardon me if you feel this has been answered numerous times, but I need answers to the following queries!

  1. Why data has to be aligned (on 4 byte/ 8 byte/ 2 byte boundaries)? Here my doubt is when the CPU has address lines Ax Ax-1 Ax-2 ... A2 A1 A0 then it is quite possible to address the memory locations sequentially. So why there is the need to align the data at specific boundaries?

  2. How to find the alignment requirements when I am compiling my code and generating the executatble?

  3. If for e.g the data alignment is 4 byte boundary, does that mean each consecutive byte is located at modulo 4 offsets? My doubt is if data is 4 byte aligned does that mean that if a byte is at 1004 then the next byte is at 1008 (or at 1005)?

Your thoughts are much welcome.

Thanks in advance! /MS

+1  A: 

In general, the one answer to all three of those questions is "it depends on your system". Some more details:

  1. Your memory system might not be byte-addressable. Besides that, you might incur a performance penalty to have your processor access unaligned data. Some processors (like older ARM chips, for example) just can't do it at all.

  2. Read the manual for your processor and whatever ABI specification your code is being generated for,

  3. Usually when people refer to data being at a certain alignment, it refers only to the first byte. So if the ABI spec said "data structure X must be 4-byte aligned", it means that X should be placed in memory at an address that's divisible by 4. Nothing is implied by that statment about the size or internal layout of structure X.

    As far as your particular example goes, if the data is 4-byte aligned starting at address 1004, the next byte will be at 1005.

Carl Norum
+3  A: 

CPUs are word oriented, not byte oriented. In a simple CPU, memory is generally configured to return one word (32bits, 64bits, etc) per address strobe, where the bottom two (or more) address lines are generally don't-care bits.

Intel CPUs can perform accesses on non-word boundries for many instructions, however there is a performance penalty as internally the CPU performs two memory accesses and a math operation to load one word. If you are doing byte reads, no alignment applies.

Some CPUs (ARM, or Intel SSE instructions) require aligned memory and have undefined operation when doing unaligned accesses (or throw an exception). They save significant silicon space by not implementing the much more complicated load/store subsystem.

Alignment depends on the CPU word size (16, 32, 64bit) or in the case of SSE the SSE register size (128 bits).

For your last question, if you are loading a single data byte at a time there is no alignment restriction on most CPUs (some DSPs don't have byte level instructions, but its likely you won't run into one).

Yann Ramin
hi theatrus,here is my doubt again! why is it that the lower 2 address lines are eliminated? Coz with this setup I can only access data from addresses 0, 4, 8.. so on. So how do the Byte manipulations are taken care in such situation? You mentioned there is no alignment restriction for a single data byte, how this is achieved when the bottom 2 address lines are don't care?Thank you for your reply!
MS
Mostly I am concerned about why the address lines are don't care when there may be byte manipulation intended in my code (and go all the way round about to do the same..)?
MS
+2  A: 

Very little data "has" to be aligned. It's more that certain types of data may perform better or certain cpu operations require a certain data alignment.

First of all, let's say you're reading 4 bytes of data at a time. Let's also say that your CPU has a 32 bit data buss. Let's also say your data is stored at byte 2 in the system memory.

Now since you can load 4 bytes of data at once, it doesn't make too much sense to have your Address register to point to a single byte. By making your address register point to every 4 bytes you can manipulate 4 times the data. So in other words your CPU may only be able to read data starting at bytes 0, 4, 8, 12, 16, etc.

So here's the issue. If you want the data starting at byte 2 and you're reading 4 bytes, then half your data will be in address position 0 and the other half in position 1.

So basically you'd end up hitting the memory twice to read your one 4 byte data element. Some CPUs don't support this sort of operation (or force you to load and combine the two results manually).

Go here for more details: http://en.wikipedia.org/wiki/Data_structure_alignment

Timothy Baldridge
+1 for the link, but you should note that only some processors tolerate misaligned data. Intel does for IA32 and IA64 architecture, but not for Itanium. Your explanation is true only for processors that are tolerant for misaligned data such as IA32/IA64. Alpha AXP would generate a fault, and I think MIPS would as well. Some OSs would handle the misaligned data in the fault handler, but the performance penalty for that is huge. And if the OS doesn't handle it, the misaligned data doesn't work _at all_ for those systems.
John Knoeller
+2  A: 

1.) Some architectures do not have this requirement at all, some encourage alignment (there is a speed penalty when accessing non-alignet data items), and some may enforce it strictly (misaligment causes a processor exception).
Many of todays popular architectures fall in the speed penalty category. The CPU designers had to make a trade between flexibility/performance and cost (silicon area/number of control signals required for bus cycles).

2.) What language, which architecture? Consult your compilers manual and/or the CPU architecture documentation.

3.) Again this is totally architecture dependent (some architectures may not permit access on byte-sized items at all, or have bus widths which are not even a multiple of 8 bits). So unless you are asking about a specific architecture you wont get any useful answers.

Durandal
A: 

"Now since you can load 4 bytes of data at once, it doesn't make too much sense to have your Address register to point to a single byte."

Why? Why can't I read positions 1, 2, 3, 4 at a time? I think doing this will not degrade performance and incur much complexity in circuitry?

wbb
This isn't an answer. I do note that you've made it into a fill question (http://stackoverflow.com/questions/3903164/why-misaligned-address-access-incur-2-or-more-accesses) which was the right thing to do.
Ben Voigt
A: 

Its completely depends on the CPU you are using!

Some architectures deal only in 32 (or 36!) bit words and you need special instructions to load singel characters or haalf words.

Some cpus (notably PowerPC and other IBM risc chips) dont care about alignments and will load integers from odd addresses.

For most modern architectures you need to align integers to word boundies and long integers to double word boundries. This simplifies the circutry for loading registers and speeds things up ever so slighly.

James Anderson