What are different storages available/used to/by a program during execution time?

+2 A:

Program counter and instruction pointer are the same thing. Basically, the instruction pointer keeps track of what instruction the CPU is executing. Your program will be in a separate part of memory sometimes referred to as the code segment. The instruction pointer points to a location in that segment.

Worth noting is that the program counter is not kept in RAM (that would be far too slow). It's a processor register. Only when a call to a function is made is the program counter stored on the stack. Then, when the program returns from the function call, return value(s) are popped off the stack and the program counter is restored.

See here for a more detailed explanation.

Thorarin 2009-08-22 13:05:51

Thanks Thaorarin for the answer. Can you please elaborate?

Sandbox 2009-08-22 13:07:17

+3 A:

The main types of memory in any high-level language are the two you identified already: stack and heap.

There's no limit to what you can put in one or the other; the difference is how and when that memory gets allocated. Stack space is allocated when a function is called, and all at once, so you cannot allocate any new stack space from inside the function, and once the function returns that memory is deallocated. Heap space, on the other hand, you allocate yourself whenever you want and in whatever quantities you want.

There are certainly other chunks of memory that are used internally that you wouldn't normally touch. "Program counter" and "instruction pointer," for example, are both the same thing: a single word of memory that keeps track of which instruction in your program is the next to execute. Each time the CPU executes an instruction, the program counter will move on to the next one.

Once you make a function call, the program counter gets stored on the stack alongside your own local variables, so that when the called function returns the calling function will know where it "left off."

VoteyDisciple 2009-08-22 13:07:43

Your explanation makes it sound as if the program counter is part of the stack, which it is not. It's a processor register. Only when a call to a function is made is the program counter stored on the stack.

Thorarin 2009-08-22 13:12:24

Excellent point. Edited accordingly.

VoteyDisciple 2009-08-22 13:13:33

+3 A:

I think that you're asking two distinct, but related questions. First, how is the data for my program organized? Different architectures do it different ways but basically, a program is composed of 3 (or 4 depending on how you count them): data (the heap and static global data), the stack (local and function call/return data), and the text (code).

Second, how does the computer run my program? This is really a question about computer architecture and operating system semantics. The program counter, or instruction pointer, typically is one of many registers in the CPU that are used in running your program. It keeps track of the memory location of the currently executing instruction. Others include the stack (or frame) pointer (the current location of the executing function's stack frame), the program status word (information about the results of the current instruction), and data registers.

There is a lot more that goes into running your program, and I've only skimmed the surface of the low-level hardware bits, but the questions you raise really require a textbook for a complete understanding. For a more thorough treatment, I'd suggest picking up a copy of Computer Organization and Architecture at your local bookstore or library.

tvanfosson 2009-08-22 13:27:20

+2 A:

There are many different conceptual stores of data in a computing machine as well as many different actual stores of memory on various concrete implementations of the a computing machine.

In the beginning there was tape, and lots of it.

As an example there is the conceptual Turing Machine where the storage is the TAPE, the TABLE and the STATE REGISTER.

These are all different forms of storage:

The TAPE is mutable and unbounded. The TABLE may be mutable (though often isn't) but is bounded the STATE register is mutable and bounded. The HEAD POSITION is simple, but very important in that it allows free movement, bounded only by the tape itself and the logic within the TABLE.

Further you can see that in the idealised models the TABLE and the STATE REGISTER are assumed to be 'fast' to access any entry inside (it's takes some constant time to access/modify) whereas accessing some arbitrary element of the tape is dependent on the distance from the current location on the tape to wherever the data is.

This allows considerable analysis about various aspects of a computing system, and crucially models the idea that, in a system an unbounded source of data will, inevitably exhibit non linear behaviour in terms of the cost of access (since the representation of how to get to the data will itself grow as the data stored grows).

A Turing machine's model of memory was not that far away from actual physical machines in the early era of mainframe computers with most permanent storage being based on (admittedly finite) tapes and the STATE REGISTER being based on a small simple pool of 'memory' in the machine with equal costs to access any entry. Many of the first computers has no 'programs' in the sense you or I think of. Instead they could have their TABLE altered (often literally by hand) and then all further behaviour was fixed. This was separation of DATA from INSTRUCTIONS.

As modern computers have changed this link with one of the first formal descriptions of a computing device has become more and more distant.

Random considered not harmful

Most modern machines are based on a Von Neumann Architecture. These explicitly place instructions and data within the same 'pool' of addressable memory. Their initial TABLE is simply sufficient to start reading instructions from memory which can decide, based on data, which instructions to use in future.

Many differing ways were designed to maintain the STATE, some kept it all in the main memory pool, others created specific regions of memory as either arbitrarily addressable named 'registers' separate from main memory or as 'stacks' often implemented by 'taking' a section of the main memory.

These are not exclusive, many have the concept of a stack and registers, some providing separate memory for the stack, others not and putting it within main memory. Several machines either mandated (or encouraged through convention) that certain registers referred to the conceptual stack, a common name for a specific register in many architectures is the Stack Pointer (SP).

Almost all had the concept of a particular storage location (typically a register) which indicated the next instruction to execute. Alteration of this value (from the basic add one to the last value) providing the primary means for the computer to be general purpose, in this way the Instruction Pointer (or IP as this special location came to be known) represents something similar to the location of the HEAD in the Turing machine (except it repesents the instructions not the data). By moving all data to a randomly addressable form the concept of the tape for data was removed from 'memory'.

Instead that which was once the primary memory of a computer is instead relegated to that of an IO device. Storage, but no longer what most people would term Memory in the sense in which we use it now.

In early computers of this kind the registers and the RAM normally operated at the same 'speed' though there was a cost to moving data from RAM into a register and back again this was a simple instruction like any other.

As the speed with which the logic units could operate increased, the speed with which (larger and larger) RAM could be accessed fell in relative terms. Part of this was that it quickly became easier to separate the logic units from the memory units from a cost perspective.

As such registers were 'special' storage which were 'close' to the logic units, typically operating at the same speed as the logic units. Main memory became 'further away' from the logic elements and began to cost more cycles to access.

Messages in a Bottleneck

Code was coming to be designed to execute in a manner which reduced the number of times that data had to be copied into a register to be operated on, then copied back to main memory. Computer designers realised that, in general a particular 'style' was in fact quite common in many programs: A smaller portion of main memory was accessed far more frequently than other parts.
The ability of a compiler/human to effectively use registers to manage this was limited. Trying to arrange the computation to make effective use of this faster storage involved complex dependency determination. Adding registers, since each was a 'special name' was not a simple solution since adding more involved recoding every application and forcing rewriting for each 'upgrade'.

It was realised that it was possible to put some of the main memory 'closer' to the logic, perhaps not as close as registers but much closer than main memory. Since it was neither cost effective (nor possible in many cases) to put all such memory close this would need to be a 'window' onto the main memory. Early designers realised that placing this in the control of the programmer would lead to many of the same problems associated with the fixed number of registers. Therefore they attempted to use this closer memory in a manner which was transparent to the programmer and the program except in so far as it affected performance.

Cache memory was born, thus temporal locality made for faster computers with no code changes.

Various forms were tried (and many co-exist even now), like exclusive and inclusive (now something could be in two places at once). It was also calculated that, if memory tended to have temporal locality it also tended towards spatial locality, as such data could be in the cache even if you hadn't requested it yet speculatively.

Some split the cache between data and instructions, others did not but almost all computers moved to having cache.

It's turtles all the way down

It wasn't long before memory became so much slower than cache that it became worth putting a slower, but bigger cache between main memory and the little cache. Level 1 and level 2 cache were created, and even more besides.

Bigger better faster stronger

As we got faster it became clear that a lot of code was in fact capable of executing at the same time, either through pipelining or through independent parallel execution. When doing this, running multiple logic units 'hidden' from the simple view as seen by the programmer and compiler, more 'hidden' registers were required to account for the multiple in flight operations. Register renaming was designed to allow there to be multiple copies of the same 'register' in use at one in different execution units to hide changes in one from another.
'Where' a piece of state was stored became even more complex, though thankfully still hidden from most programmers.

At the same time this was happening we began to experience the problems inherent in making one cpu faster, largely down to the very bottleneck cache was trying to mitigate. As such transistor budget began to be spent on other entirely independent sections of logic and it was hoped that programmers would start to write code expecting multiple units rather than one.

This leads to some problems. Previously all the 'hidden' ways that memory latency aware mitigated never made a difference to the programmer. Now however the 'shared' main memory' model of this new SMP world order worked against you. an 'address' in memory couple be in many places at once, indeed it could have several different values in those places.

Now it became important not only where memory was but how stable it was. whether a piece of memory was in the cache or not mattered, and programming models had to change. At this point the architecture's 'Memory Model' became very important (and highly complex).

Divide and Conquer

Since this complex interaction of 'hidden' opaque memory mangling was not interacting well with the new multiple separate units model various possible solutions began to present themselves.

One is obvious, if no caching is required on a value request that it not happen, simplifying cache coherency protocols.

Another is less obvious (and still unclear as to it's suitability) stop having this big shared pool of memory and instead give each unit it's own chunk of memory with explicit interaction between them and the old Main Memory.

This splitting of memory becomes even more complex as devices, previously not considered in the same class as general purpose central processing units are bent into this model.

Don't forget the future

This is of course just the beginning of more changes, in future you might find memory being described with even more complexity as qbits or as something else entirely unknown.

ShuggyCoUk 2009-09-01 19:22:20

I decided to try prose. Quite unlike me.

ShuggyCoUk 2009-09-01 19:22:59

ansaurus

tags:

views:

answers: