ansaurus

Question

Why does CPU access memory on a word boundary?

Answer 1

+3 A:

Because it is more efficient.

In your example, the CPU would have to do two reads: it has to read in the first half, then read in the second half separately, then reassemble them together to do the computation. This is much more complicated and slower than doing the read in one go if the data was properly aligned.

Some processors, like x86, can tolerate misaligned data access (so you would still need all 32 bits) - others like Itanium absolutely cannot handle misaligned data accesses and will complain quite spectacularly.

In silico 2010-09-07 05:11:03

Thanks for your reply. I just added something to my post.

smwikipedia 2010-09-07 05:17:55

Answer 2

+10 A:

The meaning of "can" (in "...CPU can access...") in this case depends on the hardware platform.

On x86 platform CPU can access data aligned on absolutely any boundary, not only on "word boundary". The misaligned access might be less efficient than aligned access, but the reasons for that have absolutely nothing to do with CPU. It has everything to do with how the underlying low-level memory access hardware works. It is quite possible that in this case the memory-related hardware will have to make two accesses to the actual memory, but that's something the CPU doesn't know about and doesn't need to know about. As far as CPU is concerned, it can access any data on any boundary.

On hardware platforms like Sun SPARC, CPU cannot access misaligned data (in simple words, your program will crash if you attempt to), which means that if for some reason you need to perform this kind of misaligned access, you'll have to split it into two (or more) accesses at CPU level.

As for why it is so... well, that's just how modern computer memory hardware works. The data has to be aligned. If it is not aligned, the access either is less efficient or does not work at all.

A very simplified model of modern memory would be a grid of cells (rows and columns), each cell storing a word of data. A programmable robotic arm can put a word into a specific cell and retrieve a word from a specific cell. One at a time. If your data is spread across several cells, you have no other choice but to make several consecutive trips with that robotic arm. On some hardware platforms the task of organizing these consecutive trips is hidden from CPU (meaning that the arm itself knows what to do to assemble the necessary data from several pieces), on some other platforms it is visible to the CPU (meaning that it is the CPU who's responsible for organizing these consecutive trips of the arm).

AndreyT 2010-09-07 05:15:00

Thanks for pointing out the difference between CPU and the memory access hardware. It's refreshing.

smwikipedia 2010-09-07 05:18:24

It seems that the boundary setting **is** hardwired and it is hardwired **by the memory access hardware**. CPU is just innocent as far as this is concerned.

smwikipedia 2010-09-07 05:21:54

@smwikipedia: Well, yes. The word boundaries are actually implemented in the actual RAM chips installed in your computer. Inside these chips the bits of data are organized into words. So, the words are pre-determined, implemented in the actual hardware. The are absolutely fixed for that reason. In order to access data you select a specific word using so called "wordlines" inside the chip and then read or write bits using so called "bitlines".

AndreyT 2010-09-07 05:27:30

Nice answer. I particularly enjoyed the robotic arm analogy.

jschmier 2010-09-07 16:00:07

Answer 3

+1 A:

It saves silicon in the addressing logic if you can make certain assumptions about the address (like "bottom n bits are zero). Some CPUs (x86 and their work-alikes) will put logic in place to turn misaligned data into multiple fetches, concealing some nasty performance hits from the programmer. Most CPUs outside of that world will instead raise a hardware error explaining in no uncertain terms that they don't like this.

All the arguments you're going to hear about "efficiency" are bollocks or, more precisely are begging the question. The real reason is simply that it saves silicon in the processor core if the number of address bits can be reduced for operations. Any inefficiency that arises from misaligned access (like in the x86 world) are a result of the hardware design decisions, not intrinsic to addressing in general.

Now that being said, for most use cases the hardware design decision makes sense. If you're accessing data in two-byte words, most common use cases have you access offset, then offset+2, then offset+4 and so on. Being able to increment the address byte-wise while accessing two-byte words is typically (as in 99.44% certainly) not what you want to be doing. As such it doesn't hurt to require address offsets to align on word boundaries (it's a mild, one-time inconvenience when you design your data structures) but it sure does save on your silicon.

As a historical aside, I worked once on an Interdata Model 70 -- a 16-bit minicomputer. It required all memory access to be 16-bit aligned. It also had a very small amount of memory by the time I was working on it by the standards of the time. (It was a relic even back then.) The word-alignment was used to double the memory capacity since the wire-wrapped CPU could be easily hacked. New address decode logic was added that took a 1 in the low bit of the address (previously an alignment error in the making) and used it to switch to a second bank of memory. Try that without alignment logic! :)

JUST MY correct OPINION 2010-09-07 05:21:55

Answer 4

A:

Word alignment is not only featured by CPUs

On the hardware level, most RAM-Modules have a given Word size in respect to the amount of bits that can be accessed per read/write cycle.

On a module I had to interface on an embedded device, addressing was implemented through three parameters: The module was organized in four banks which could be selected prior to the RW operation. each of this banks was essentially a large table 32-bit words, wich could be adressed through a row and column index.

In this design, access was only possible per cell, so every read operation returned 4 bytes, and every write operation expected 4 bytes.

A memory controller hooked up to this RAM chip could be desigend in two ways: either allowing unrestricted access to the memory chip using several cycles to split/merge unaligned data to/from several cells (with additional logic), or imposing some restrictions on how memory can be accessed with the gain of reduced complexity.

As complexity can impede maintainability and performance, most designers chose the latter [citation needed]

sum1stolemyname 2010-09-07 06:24:40

Thanks for your concise answer.

smwikipedia 2010-09-07 13:43:20

ansaurus

tags:

views:

answers:

Why does CPU access memory on a word boundary?

ADD 1

ADD 2

related questions