views:

298

answers:

8

I have a basic question about assembly.

Why do we bother doing arithmetic operations only on registers if they can work on memory as well?

For example both of the following cause (essentially) the same value to be calculated as an answer:

Snippet 1

.data
    var dd 00000400h

.code

    Start:
        add var,0000000Bh
        mov eax,var
        ;breakpoint: var = 00000B04
    End Start


Snippet 2

.code

    Start:
        mov eax,00000400h
        add eax,0000000bh
        ;breakpoint: eax = 0000040B
    End Start



From what I can see most texts and tutorials do arithmetic operations mostly on registers. Is it just faster to work with registers?

Edit: That was fast :)

A few great answers were given; best answer was chosen based on the first good answer.

+2  A: 

Registers are accessed way faster than RAM memory, since you don't have to access the "slow" memory bus!

naivists
A: 

Yes, it's much much much faster to use registers. Even if you only consider the physical distance from processor to register compared to proc to memory, you save a lot of time by not sending electrons so far, and that means you can run at a higher clock rate.

Rob Lourens
A: 

Yes - also you can typically push/pop registers easily for calling procedures, handling interrupts, etc

Jeff
+7  A: 

Registers are much faster and also the operations that you can perform directly on memory are far more limited.

Tronic
Right on! Also, and while "eventually" values get moved back into main memory, so long as things take place in registers, the buses are available for other [parallel] functions, such as ahead reading to cache etc
mjv
And register-register instructions are much shorter, therefore faster. They don't have to calculate effective addresses.
Mike Dunlavey
A: 

It's just that the instruction set will not allow you to do such complex operations:

add [0x40001234],[0x40002234]

You have to go through the registers.

Nicolas Viennot
There are lots of CPU architectures that will permit exactly those kinds of instructions. The issue is speed, not what operations are permitted. The limited operations come about because nobody in their right mind would do them RAM to RAM anyway.
JUST MY correct OPINION
The question was using IA32 instruction set. And in IA32, It doesn't exist. You just cannot do it.
Nicolas Viennot
A: 

We use registers because they are fast. Usually, they operate at CPU's speed.
Registers and CPU cache are made with different technology / fabrics and
they are expensive. RAM on the other hand is cheap and 100 times slower.

Nick D
+14  A: 

If you look at computer architectures, you find a series of levels of memory. Those that are close to the CPU are the fast, expensive (per a bit), and therefore small, while at the other end you have big, slow and cheap memory devices. In a modern computer, these are typically something like:

 CPU registers (slightly complicated, but in the order of 1KB per a core - there
                are different types of registers. You might have 16 64 bit
                general purpose registers plus a bunch of registers for special
                purposes)
 L1 cache (64KB per core)
 L2 cache (256KB per core)
 L3 cache (8MB)
 Main memory (8GB)
 HDD (1TB)
 The internet (big)

Over time, more and more levels of cache have been added - I can remember a time when CPUs didn't have any onboard caches, and I'm not even old! These days, HDDs come with onboard caches, and the internet is cached in any number of places: in memory, on the HDD, and maybe on caching proxy servers.

There is a dramatic (often orders of magnitude) decrease in bandwidth and increase in latency in each step away from the CPU. For example, a HDD might be able to be read at 100MB/s with a latency of 5ms (these numbers may not be exactly correct), while your main memory can read at 6.4GB/s with a latency of 9ns (six orders of magnitude!). Latency is a very important factor, as you don't want to keep the CPU waiting any longer than it has to (this is especially true for architectures with deep pipelines, but that's a discussion for another day).

The idea is that you will often be reusing the same data over and over again, so it makes sense to put it in a small fast cache for subsequent operations. This is referred to as temporal locality. Another important principle of locality is spatial locality, which says that memory locations near each other will likely be read at about the same time. It is for this reason that reading from RAM will cause a much larger block of RAM to be read and put into on-CPU cache. If it wasn't for these principles of locality, then any location in memory would have an equally likely chance of being read at any one time, so there would be no way to predict what will be accessed next, and all the levels of cache in the world will not improve speed. You might as well just use a hard drive, but I'm sure you know what it's like to have the computer come to a grinding halt when paging (which is basically using the HDD as an extension to RAM). It is conceptually possible to have no memory except for a hard drive (and many small devices have a single memory), but this would be painfully slow compared to what we're familiar with.

One other advantage of having registers (and only a small number of registers) is that it lets you have shorter instructions. If you have instructions that contain two (or more) 64 bit addresses, you are going to have some long instructions!

David Johnstone
+1. Very helpful, thanks.
Cam
+1 for including the Internet. Really makes the storage hierarchy complete.
Michael E
+1  A: 

Generally speaking register arithmetic is much faster and much preferred. However there are some cases where the direct memory arithmetic is useful. If all you want to do is increment a number in memory (and nothing else at least for a few million instructions) then a single direct memory arithmetic instruction is usually slightly faster than load/add/store.

Also if you are doing complex array operations you generally need a lot of registers to keep track of where you are and where your arrays end. On older architectures you could run out of register really quickly so the option of adding two bits of memory together without zapping any of your current registers was really useful.

James Anderson