views:

259

answers:

6

Hi, I know this is more "heavy" question, but I think its interesting too. It was part of my previous questions about compiler functions, but back than I explained it very badly, and many answered just my first question, so ther it is:

So, if my knowledge is correct, modern Windows systems use paging as a way to switch tasks and secure that each task has propriate place in memory. So, every process gets its own place starting from 0.

When multitasking goes into effect, Kernel has to save all important registers to the task´s stack i believe than save the current stack pointer, change page entry to switch to another proces´s physical adress space, load new process stack pointer, pop saved registers and continue by call to poped instruction pointer adress.

Becouse of this nice feature (paging) every process thinks it has nice flat memory within reach. So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.

But, when there is no more segmentation for the process, why does still compilers create variables on the stack, or when global directly in other memory space, than directly in program code?

Let me give an example, I have a C code:int a=10;

which gets translated into (Intel syntax):mov [position of a],#10

But than, you actually ocupy more bytes in RAM than needed. Becouse, first few bytes takes the actuall instruction, and after that instruction is done, there is new byte containing the value 10.

Why, instead of this, when there is no need to switch any segment (thus slowing the process speed) isn´t just a value of 10 coded directly into program like this:

xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction

Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.

I know, that const variables do this, but they are not really variables, when you cannot change them.

I hope I eplained my question well, but I am still learning English, so forgive my sytactical and even semantical errors.

EDIT:

I have read your answers, and it seems that based on those I can modify my question:

So, someone told here that global variable is actually that piece of values attached directly into program, I mean, when variable is global, is it atached to the end of program, or just created like the local one at the time of execution, but instead of on stack on heap directly?

If the first case - attached to the program itself, why is there even existence of local variables? I know, you will tell me becouse of recursion, but that is not the case. When you call function, you can push any memory space on stack, so there is no program there.

I hope you do understand me, there always is ineficient use of memory, when some value (even 0) is created on stack from some instruction, becouse you need space in program for that instruction and than for the actual var. Like so: push #5 //instruction that says to create local variable with integer 5 And than this instruction just makes number 5 to be on stack. Please help me, I really want to know why its this way. Thanks.

A: 

not quite sure what your confusion is?

int a = 10; means make a spot in memory, and put the value 10 at the memory address

if you want a to be 10

#define a 10

though more typically

#define TEN 10
Keith Nicholas
It's worth looking at his earlier question. He seems to think that the people who write compilers are being gratuitously stupid about how they use memory.
dmckee
+5  A: 

Consider:

  • local variables may have more than one simultaneous existence if a routine is called recursively (even indirectly in, say, a recursive decent parser) or from more than one thread, and these cases occur in the same memory context
  • marking the program memory non-writable and the stack+heap as non-executable is a small but useful defense against certain classes of attacks (stack smashing...) and is used by some OSs (I don't know if windows does this, however)

Your proposal doesn't allow for either of these cases.

dmckee
"may have more than one simultaneous existence" - and this is true even for modifiable globals, if the same executable code is running in multiple processes. The code need only exist in physical RAM once (since it's RO, your other point), and might be mapped to the same or different virtual addresses in the process (I don't know which in the case of Windows). If variables were embedded in the middle of it then the page would have to be duplicated, losing that optimisation.
Steve Jessop
Also: instance variables of a class
BlueRaja - Danny Pflughoeft
It's also useful that stack-based variables only exist (and take up memory) while the function is running. If they were all stored at absolute addresses, they'd all take up memory for the whole time that the process was running, which is vastly inefficient.
caf
@Steve Jessop: On Win32, code is typically stored in read-only shareable executable segments. When you run two instances of your app, there will be only one physical copy of such segments. You can see this behavior with SysInternals' process viewer.
MSalters
Well, first, my way allows it. When function is called, you can just simply push that variable onto stack from any position, even within some memory cell directly in program code. Second, why this defense? Becouse of virtual memory takes in place, you cannot rewrite other process, and in your own, compiler isnt dumb to let you overwrite piece of program itself.
B.Gen.Jack.O.Neill
*"compiler isnt dumb to let you overwrite piece of program itself."* Don't kid yourself. This depends on a *lot* of things, but processors supporting marking pages as executable/non-executable and writable/non-writable *because* these attacks have been very, very common in the past.
dmckee
@B-gen: Don't feel you have to take our word for it though. A simple, non-optimizing compiler for a small language is a week's project (maybe two). Perhaps we haven't understood you, or you see what others don't. Read the Crenshaw tutorial or something and show us how it's done. I's a lot easier to debate the merits when you have working code.
dmckee
+3  A: 

What you are talking about is optimization, and that is the compiler's business. If nothing ever changes that value, and the compiler can figure that out, then the compiler is perfectly free to do just what you say (unless a is declared volatile).

Now if you are saying that you are seeing that the compiler isn't doing that, and you think it should, you'd have to talk to your compiler writer. If you are using VisualStudio, their address is One Microsoft Way, Redmond WA. Good luck knocking on doors there. :-)

T.E.D.
+4  A: 

So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.

Yes and no. Different program segments have different purposes - despite the fact that they reside within flat virtual memory. E.g. data segment is readable and writable, but you can't execute data. Code segment is readable and executable, but you can't write into it.

why does still compilers create variables on the stack, [...] than directly in program code?

Simple.

  1. Code segment isn't writable. For safety reasons first. Second, most CPUs do not like to have code segment being written into as it breaks many existing optimization used to accelerate execution.
  2. State of the function has to be private to the function due to things like recursion and multi-threading.

isn´t just a value of 10 coded directly into program like this

Modern CPUs prefetch instructions to allow things like parallel execution and out-of-order execution. Putting the garbage (to CPU that is the garbage) into the code segment would simply diminish (or flat out cancel) the effect of the techniques. And they are responsible for the lion share of the performance gains CPUs had showed in the past decade.

when there is no need to switch any segment

So if there is no overhead of switching segment, why then put that into the code segment? There are no problems to keep it in data segment.

Especially in case of read-only data segment, it makes sense to put all read-only data of the program into one place - since it can be shared by all instances of the running application, saving physical RAM.

Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.

No, not really. Most of the code is relocatable or position independent. The code is patched with real memory addresses when OS loads it into the memory. Actually special techniques are used to actually avoid patching the code so that the code segment too could be shared by all running application instances.

The ABI is responsible for defining how and what compiler and linker supposed to do for program to be executable by the complying OS. I haven't seen the Windows ABI, but the ABIs used by Linux are easy to find: search for "AMD64 ABI". Even reading the Linux ABI might answer some of your questions.

Dummy00001
There is no code and data segment protected mode in existing x86 operating systems. Access control is handled per page not per segment.Data execution prevention actually was introduced not so long ago with the "NX bit" which only works in PAE and long mode.
Axel Gneiting
@Axel. "no code and data segment" - check process map on Linux. look for text, data, rodata segments. Haven't seen any similar tool for Windows. "Access control is handled per page not per segment." - you can micromanage it, yes; but OS does that already for your process segments automatically. The "NX bit": other CPUs had that long before. And yes, I'm using the notion of "segment" as in "data segment of the program", not as in "the gimmick of 8086 memory organization".
Dummy00001
He was talking about *memory segmentation* of x86 (which is theoretically still supported in protected mode). That has *nothing* to do with a "data segment" in an ELF file. I know that the OS does it automatically, but it uses pages, not segments for that.
Axel Gneiting
+3  A: 

Why isn´t just a value of 10 coded directly into program like this:

xor eax,eax //just some instruction  
10 //the value iserted to the program  
call end //just some instruction  

That is how global variables are stored. However, instead of being stuck in the middle of executable code (which is messy, and not even possible nowadays), they are stored just after the program code in memory (in Windows and Linux, at least), in what's called the .data section.

When it can, the compiler will move variables to the .data section to optimize performance. However, there are several reasons it might not:

  • Some variables cannot be made global, including instance variables for a class, parameters passed into a function (obviously), and variables used in recursive functions.
  • The variable still exists in memory somewhere, and still must have code to access it. Thus, memory usage will not change. In fact, on the x86 ("Intel"), according to this page the instruction to reference a local variable:

    mov eax, [esp+8]
    

    and the instruction to reference a global variable:

    mov eax, [0xb3a7135]
    

    both take 1 (one!) clock cycle.

    The only advantage, then, is that if every local variable is global, you wouldn't have to make room on the stack for local variables.

  • Adding a variable to the .data segment may actually increase the size of the executable, since the variable is actually contained in the file itself.

  • As caf mentions in the comments, stack-based variables only exist while the function is running - global variables take up memory during the entire execution of the program.

BlueRaja - Danny Pflughoeft
A: 

Variables have storage space and can be modified. It makes no sense to stick them in the code segment, where they cannot be modified.

If you have code with int a=10 or even const int a=10, the compiler cannot convert code which references 'a' to use the constant 10 directly, because it has no way of knowing whether 'a' may be changed behind its back (even const variables can be changed). For example, one way 'a' can be changed without the compiler knowing is, if you have a pointer which points 'a'. Pointers are not fixed at runtime, so the compiler cannot determine at compile time whether there will be a pointer which will point to and modify 'a'.

5ound