views:

189

answers:

5

I have taken a course about Operating System design and concept and now I am trying to study Linux kernel thoroughly. I have a question that I cannot get rid of. In modern operating systems each process has own virtual address space(VAS) (eg, 0 to 2^32-1 in 32-bit systems). This provides many advantages. But in the implementation I am confused at some points. Let me explain it by giving an example:

Let's say we have two processes p1, p2; p1 and p2 have their own VASes. An address 0x023f4a54 is mapped to different physical addresses(PA), how can it be? How is done this translation in this manner. I mean I know translation mechanism but I cannot understand that same address is mapped to different physical address when it comes different processes' address space.

0x023f4a54 in p1's VAS => PA 0x12321321
0x023f4a54 in p2's VAS => PA 0x23af2341 # (random addresses)
+1  A: 

Your question confuses a virtual address with using an address as a way of identification, so the first step to understanding is to separate the concepts.

A working example is the C runtime library function sprintf(). When properly declared and called, it is incorporated into a program as a shared object module, along with all the subfunctions it needs. The address of sprintf varies from program to program because the library is loaded in an available free address. For a simple hello world program, sprintf might be loaded at address 0x101000. For a complex program which calculates taxes, it might be loaded at 0x763f8000 (because of all the yucky logic the main program contains goes before the libraries it references). From a system perspective, the shared library is loaded into memory in one place only, but the address window (range of addresses) that each process sees that memory is unique to that executable.

Of course, this is complicated further by some of the features of Security Enhanced Linux (SELinux) which randomizes the addresses at which different program sections are loaded into memory, including shared library mapping.

--- clarification --- As someone correctly points out, the virtual address mapping of each process is specific to each process, not unlike its set of file descriptors, socket connections, process parent and children, etc. That is, p1 might map address 0x1000 to physical 0x710000 while p2 maps address 0x1000 to a page fault, and p3 is mapped to some shared library at physical 0x9f32a000. The virtual address mapping is carefully supervised by the operating system, arguably for providing features such as swapping and paging, but also to provide features like shared code and data, and interprocess shared data.

wallyk
My problem is not about sharing. The problem is that when p1 tries to reach VA 0x04 this VA is translated to a physical address then context switch happened and another process p2 tries to reach same VA 0x04 this VA is translated to another physical address but it cannot be same with p1's. How can be this difference provided by OS. They can be reach same VAs but these VAs are not mapped to same physical address. I wonder the mechanism beyond that translation issue. How does OS prevent these process from clashing with other?
Dirtybit
@Pushdown As mentioned, the OS keeps a seperate mapping per process, and it swaps in and out those mappings when it switches between processes.
nos
+6  A: 

A CPU that provides virtual memory lets you set up a mapping of the memory addresses as the CPU sees it to physical memory addresses , typically this is done by a harware unit called the MMU.

The OS kernel can program that MMU, typically not down to the individual addresses, but rather in units of pages (4096 bytes is common). This means the MMU can be programmed to translate e.g. virtual addresses 0x1000-0x2000 to be translated to physical address 0x20000-0x21000.

The OS keeps one set of these mapping per process, and before it schedules a process to run, it loads that mapping into the MMU before it switches control back to the process. This enables different mappings for different processes, and nothing stops those mappings from mapping the same virtual address to a different physical address.

All this is transparent as far as the program is concerned, it just executes instructions on the CPU, and as the CPU has been set to virtual memory mode (paged mode), every memory access is translated by the MMU before it goes out on the physical bus to the memory.

The actual implementation details are complicated, but here's some references that might provide more insight;

nos
Btw, thanks for gorman book. It is very good, indeed.
Dirtybit
+1  A: 

This mapping (virtual address to physical address) is handled by the OS and the MMU (see @nos' answer); the point of this abstraction is so p1 "thinks" it's accessing 0x023f4a54 when in reality it's accessing 0x12321321.

If you go back to your class on how programs work on the machine code level, p1 will expect some variable/function/whatever to be at the same place (eg 0x023f4a54) every time it's loaded. The OS mapping physical to virtual address provides this abstraction. In reality, it won't always be loaded to the same physical address, but your program doesn't care as long as it's in the same virtual address.

NullUserException
The magic *internals* are the MMU (hardware actually, not the OS), see nos's answer. The OS does decide the mapping and programs the MMU to actually implement the translation.
Wim
A: 

There are two important data structures dealing with paging: the page table and the TLB. The OS maintains different page tables per process. The TLB is just a cache of the page table.

Now, different CPUs are, well, different. x86 accesses page tables directly, using a special register called CR3 which points to the page table in use. MIPS processors don't know anything about the page table, so the OS must work directly with the TLB.

Some CPUs (e.g: MIPS) keep an identifier in the TLB to separate different processes apart, so the OS can just change a control register when doing a context switch (unless it needs to reuse an identifier). Other CPUs require a full TLB flush in every context switch. So, basically, the OS needs to change some control registers and possibly needs to clear the TLB (do a TLB flush) to allow virtual addresses from different processes map to whatever physical addresses they should.

ninjalj
A: 

Thanks for all answers. The actual point that i dont know is that how same virtual address of different processes does not clash with each other's physical correspondent. I found the answer in the link below, each process has its own page table.

http://tldp.org/LDP/tlk/mm/memory.html

Dirtybit