How much faster are register based architectures than stack architectures?

tags:

compiler

views:

137

answers:

+5 Q:

How much faster are register based architectures than stack architectures?

Studying compilers course, I am left wondering why use registers at all. It is often the case that the caller or callee must save the register value and then restore it.

In a way they always end up using the stack anyway. Is creating additional complexity by using registers really worth it?

Excuse my ignorance.

Update: Please, I know that registers are faster than RAM and other types of cache. My main concern is that one has to "save" the value that is in the register and the "restore" it to the register afterwards. In both cases we are accessing some kind of cache. Would it not be better to use cache in the first place?

It can make a huge difference. I once directed the compiler on PowerPC/Macintosh to put the right local variables into registers and got a 2 times speed-up of the application's main processing task. The task was essentially CPU bound, but eliminating the memory accesses by using registers gave the 2 times speed-up. The speed-up can be much more dramatic in other circumstances.

This was in a leaf function. It did not call other functions.

Peter Mortensen 2010-03-11 16:29:24

Two times speedup is nothing. You get that for switching from Java to C++. The actual differences are orders of magnitudes (if talking of not using registers at all).

Tronic 2010-03-11 16:32:35

+5 A:

In the speed/latency hierarchy, registers are fastest (usually zero cycle latency), L1 cache is next (typically 1 or more cycles of latency), and then it goes downhill rapidly after that. So in general register accesses are "free" whereas there is always some cost involved in memory accesses, even when that access is cached.

Saving and restoring registers typically only happens (a) at the begin/end of a function call or context switch, or (b) when the compiler runs out of registers for temporary variables and needs to "spill" one or more registers back to memory. In general, well-optimised code will keep the majority of frequently accessed ("hot") variables in registers, at least within the innermost loop(s) of a function.

Paul R 2010-03-11 16:29:29

Yes I understand that. But we often-times must "save" and "restore" values into and from registers. That is my main concern. Updated question.

drozzy 2010-03-11 18:07:52

See added paragraph in an answer above.

Paul R 2010-03-11 18:38:58

+1 A:

Accessing RAM is generally MUCH slower than accessing a register both in terms of latency and bandwidth. There are CPUs that have hardware stack of limited size - this allows pushing registers to the stack and popping them back - but they still use registers directly for calculations. Working with a pure stack machine (of which there are many academic examples) is rather difficult too, adding more complexity.

Tronic 2010-03-11 16:30:49

+1 A:

The difference between stack- and register-based machines is fuzzy. A lot of modern register machines use register renaming to hide the stack, only dumping data to the actual stack when they run out of internal registers. Some old stack machines did something similar, pipelining several instructions and peephole optimizing a push-modify-pop sequence into an in-place modify.

The way modern CPUs use fancy parallel execution of instructions, there probably is not much difference between stack versus register machines. Register machines may have a slight edge because the compiler can give better hints about data reuse, but a billion dollar Lisp stack machine would be blazing fast if Intel bothered to design one.

Daniel Newby 2010-03-11 17:02:31

+1 A:

I'd say it's not really an issue with compilers as it is with CPUs. Compilers have to work with the target architecture.

Here's what the other answers are glossing over: it depends on the architecture of the CPU at the level of the actual circuitry. Machine instructions boil down to get data from somewhere, modify the data, load or goto the next instruction.

Analogy

Think of the problem like a woodworker working on building or repairing a chair for you. His questions will be "Where is the chair", and, "What needs to be done to the chair". He might be able to fix it at your house or he might need to take the chair back to his shop to work on it. Either way will work but depends on how prepared he is to work outside of a fixed location. It could slow him down or it could be his specialty.

Now, back to the CPU.

Explanation

Regardless of how parallel a CPU may be, like having several adders or instruction decode pipelines, those circuits are located in specific locations on the chip and the data must be loaded into the places where the operation can be performed. The program is responsible for moving the data into and out of those locations. In a stack-based machine, it might provide instructions that modify data directly but it may be doing housekeeping in the microcode. An adder works the same way regardless whether the data came from the stack or from the heap. The difference is in the programming model available to the programmer. Registers are basically a defined place to work on data.

Kelly French 2010-03-16 16:47:44

Well, well it seems the answer to this was also in the book (modern compiler implementation in java). The book presents 4 answers:

Some procedures don't call other procedures. If you draw the diagram of procedure calls, and assume that each procedure calls on average 1-2 other procedures, you come up with a tree, in which the "leafs" (the procedures that don't call others) outnumber the tree non-left nodes. So you win that way. Some compilers don't allocate a stack frame at all for these leaf nodes.
Some optimizing compilers use "interprocedural register allocation" - which basically means they analyse all of your source code and make smart ways of storing arguments to procedures ahead of time, thus minimizing writing to stack.
Some procedures are done with a variable before they call another function - in which case that register can be just overwritten.
Some architectures use "register windows", so that each function invocation can allocate fresh set of registers without memory traffic.

drozzy 2010-03-17 12:42:03

+1 A:

I believe you're asking why use registers, since variables eventually go to the stack anyway.

The answer is, think of registers like a cache for the top 5 or 6 (or whatever) items on the top of the stack. If the top of the stack is being accessed much, much more than the bottom (which is true in a lot of programs), then having this cache will speed things up.

I guess you could say why have user-visible registers at all, instead of a transparent cache of the top few bits. I'm not sure, but I suspect that letting the compiler know what values will be cached lets it optimize storage allocation more. After all, if there's some cutoff in the stack after which variable accesses will be much more expensive, you'd arrange your variables to work with that.

Noah Lavine 2010-03-17 12:50:22

ansaurus

tags:

views:

answers:

How much faster are register based architectures than stack architectures?

Analogy

Explanation

related questions