views:

269

answers:

5

I am trying to put up the high level description of different stages in program life time from source code to its execution.


Points:

  1. Preprocessing: Macros, include files and compiler directive are processed in this phase.
  2. Compilation: Source files are compiled into obj files
  3. Linker: Different obj files are linked to single executable. At this stage the virtual addresses are assigned to functions,variables, data in executable. For 32 bit machine, each process has 4GB of address space. And 1-2 GB is reserved of OS. So address space in 2-3 GB can be assigned to any process.
  4. Execution: During program execution loader comes into picture. It basically loads the program from virtual address space to physical memory address. So when process starts executing, OS allocates memory for the process and call its main function.


Questions:

  1. If a program binary image size is 2MB, then is it that complete binary image has to be loaded into physical memory for program execution? My understanding is that the program has to be fully loaded in physical memory for execution. It will not be possible to run a program of size 512MB on a machine with 256 MB of physical memory. Only when memory requirement of program grows then virtual memory and paging are helpful.

  2. When program asks for more memory, i.e. when it allocates heap memory using new/malloc, then memory gets reserved in virtual address space. It will not get committed until it is referenced.


Please point out wherever you feel my understanding is wrong.
Is there any article or blog that could give me a one or two page high-level description of the whole process?

A: 

1)So if a program binary image size is 2MB , then is it that complete binary image has to be loaded into physical memory for program execution?As per my understanding the program has to be fully loaded in physical memory for execution. It will not be possible to run a program of size 512MB on a machine with 256 MB of physical memory . Only when memory requirement of program grows then virtual memory and paging are helpful.

No, most (all?) modern operating systems load pages on demand. If a page is not used it won't be loaded.

2) When program asks for more memory ie when it allocates heap memory using new/malloc, then memory gets reserved in virtual address space . It will not get committed until it is referenced.

Not necessarily - the runtime could requesta big chunk up front and commit it immedaiately and then parcel out committed memory. I'm not aware of anything that actually does this, but the whole area is implementation dependent.

anon
Okay so it means that initially when program is loaded into memory , it is loaded in terms os pages and not as complete binary image?
Alien01
Yes, that's right.
anon
A: 

Under number 4, I think you may be missing that this is where the program is copied from some physical storage, e.g. a disk or a server, into the operating system's memory. That isn't quite the same as what you state since the physical and virtual memory are part of the operating system to my mind.

On the first doubt, not necessarily, I think. Consider how if I start running a game that there is an inital time spent loading files that is part of the same executable and so there is something running to tell the O/S to load files.

JB King
A: 

This doesn't look all that language-agnostic to me, since lots of languages don't have anything corresponding to the preprocessing phase. However, it's reasonably accurate for a start.

You seem to be confusing virtual address space with disk file storage. Actually, it's an adjunct to physical memory, and works the same way (except for performance). It uses the disk, but not in the same way as using a file.

You know how physical memory works. Virtual memory is a way of faking a larger memory in a way that's usually transparent. The file is divided into "pages", and the pages are read in from the disk as needed. Physical memory is divided into "page frames", and the physical memory address doesn't have anything to do with the virtual address it's currently representing. Obviously, if the program uses more memory than is physically available, page frames will have to be reused, so the contents of the page frame will have to be written back (if changed from when it was read), and a new page loaded.

If the program uses only part of its address space at a time (the "working set"), and that part is few enough pages so they can all sit in physical memory at once, this works well. If it's constantly referring to more pages than can fit in physical memory, pages constantly have to be read in from the disk ("thrashing"), performance drops drastically, and the disk is under constant load.

Therefore, when I have compiled and linked a program, there's an executable file on the disk, in the file system. When I execute it, it gets assigned an address space, and then it gets more complicated. Effectively, it's loaded into memory, and how much physical and how much virtual memory is irrelevant to the user (except that if it doesn't have enough physical it's going to run awfully slow).

Therefore, it is possible to run a 512M program with 256M of physical memory.

When memory is requested from the heap, it is assigned to memory locations. At least the C and C++ standard require that it be usable, unless the request failed, so "committed" looks to me to be an odd choice of words. It doesn't have to be in physical memory until it's used.

David Thornley
A: 

Point 2 is incomplete. The compiler generates assembly that is assembled into binary files.

Point 3 is wrong. Kernel virtual memory space reservation has got nothing to do with the linker. The kernel space is OS dependent. In Windows it's even configurable (the infamous /3GB switch).

Point 4 is wrong. The executable image is mapped into virtual memory. It's not actually "loaded" per se.

The answer to your questions :

  1. The program is mapped into virtual memory, not physical memory. The Virtual Memory Manager (VMM) is then responsible for making sure the memory is in physical memory when needed.
  2. new/alloc request heap memory. The heap is an abstraction over the virtual memory to minimize the amount of kernel switches a memory allocation may incur. If the heap is too small to satisfy the allocation request, the heap will grow, resulting in virtual memory allocation. A page is committed at the discretion of the heap manager.
Edouard A.
Point 3: When linker creates a binary executable then what addresses are assigned to the functions,data , variables in image.I think they are virtual addresses.
Alien01
Point 3: Even if we have /3GB switch , applications in windows will not use 3GB address space unless specified explicitly.Linker has a switch which can tell application to use 3GB address space.
Alien01
The linker is just going to tell to mention where data is relative one to another, and then add switches to the PE header for the OS. But it's the OS job to map to virtual space, the linker just assembles binaries into one file : the executable.
Edouard A.
The reason why binaries don't get 3GB even with the boot switch is for efficiencies and retro-compatibility reasons. But really the 3GB stuff is a hack. It also sets up a virtual translation table used for DLLs that might be loaded at a virtual address different from the one requested.
Edouard A.
A: 

The DLLs and executables are loaded using memory-mapped-files, so they're not actually copied to RAM, they are still on disk, just having their bytes mapped to process' virtual address space.

zvolkov