How MMU(Memory management Unit) unit in a processor protects the memory segments

The MMU (Memory Management Unit) is a fundamental block of systems that want to have separate and protected memory spaces. I am going to keep this simple, as whole books can be written about memory management hardware and strategies...

Without protection, a program running in any process would be able to access the memory of any other process. Even if you ignore the security implications, this means that a bug in one program could overwrite memory belonging to some other process. It is not easy to debug this class of problem because the symptoms show up very far removed from the cause.

So, some kind of organizing principle is required so that each process can only see and modify memory assigned to it. And, because bugs happen, it is important that this organization be supported by hardware so that even accidental access to the wrong part of memory is difficult.

An advantage of this is that it also becomes possible to make the memory map of every process appear identical. The linker can locate every program at the same start address, put stack and heap in predictable areas, and reserve memory for interaction with the kernel.

The MMU is the hardware component that implements the translation from the logical addresses that a process uses to the physical addresses that the hardware uses. It also provides security features such as marking only some parts of memory as executable. It provide the data structures that the kernel needs to implement process swapping and virtual memory, so that the memory pages belonging to process A cannot even be seen by process B, but both A and B can be seen by the trusted kernel.

To achieve this in a practical way, the physical memory is divided into pages, typically 4KB in size. Each logical address is broken into a page number and an offset. The page numbers index a table in the MMU that translates each logical page to some physical address. This translation happens during every memory access cycle. A single physical page can be mapped into no processes (its probably also in a pool of free pages in that case), exactly one, or several.

The stack, data, and heap of a process is generally made up of pages mapped into exactly that one process. That helps prevent a bug in one process from affecting others, because each process can only write to its own stack, data and heap pages.

If the same page of physical memory is mapped into more than one process, then it is visible to those. This is how a DLL on Windows or a .so on Unix is shared: the pages holding its program text are mapped into every process that is linked to it.

The MMU has a mechanism that throws an exception when a page is accessed that hasn't been mapped to a process. Handling that exception makes it possible to implement virtual memory, and to grow the amount of memory allocated to a process as its needs change.

For processors that use memory (most of them) there is a memory interface of some sort, some have names (like amba, axi, wishbone), some do not. From a processors perspective this is address, and data and please either read or write what is at that address. In the good old days you would have a single bus and your flash and ram and peripherals would sit on this bus looking at certain (usually upper) bits of the address to determine if they were being addressed and if so then read from or jump on the data bus otherwise remain tristated. Today depending on the chip, etc, some of that memory decoding happens in or close to the core and your public interface to the core or chip might be several busses, there may be a specific flash bus, and a specific sram bus and specific dram bus, etc.

So the first problem you have with a flat linear address space, even if divided up into flash and ram, the ram portion is flat, address 0 to N-1 for N bytes. For a non-embedded operating system, to make peoples lives easier if there were only some way for programs to assume they were all starting at address 0 or address 0x100 or address 0x8000, instead of having to be somehow compiled for whatever the next free memory space is, or for the operating system to not have to completely move a program out of lower memory and replace it with another whenever task switching. An old easy way was to use intels segment:offset scheme. Programs always started at the same place because the code segment was adjusted before launching the program and the offset was used for executing (very simplified view of this model), when task switching among programs you just change the code segment, restore the pc for the next program. One program could be at address 0x1100 and another at 0x8100 but both programs think they are at address 0x0100. Easy for all the developers. MMUs provide the same functionality by taking that address on the processor bus and calling it a virtual address, the mmu normally sits up close to the processor between the processors memory interface and the rest of the chip/world. So you could again have the mmu see address 0x0100 look that up in a table and go to physical address 0x0100, then when you task switch you change the table so the next fetch of 0x0100 goes to 0x1100. Each program thinks it is operating at address 0x0100, linking, compiling, developing, loading and executing code is less painful.

The next feature is caching, memory protection, etc. So the processor and its memory controller may decode some addresses before reaching the mmu, perhaps certain core registers and perhaps the mmu controls themselves. But other things like memory and peripherals may be addressed on the other side of the mmu, on the other side of the cache which is often the next layer of the onion outside the mmu. When polling your serial port for example to see if there is another byte available you dont want the data access to be cached such that the first read of the serial port status register actually goes out on the physical bus and touches the serial port, then all subsequent reads read the stale version in the cache. You do want this for ram values, the purpose of the cache, but for volatile things like status registers this is very bad. So depending on your system you are likely not able to turn on the data cache until the mmu is enabled. The memory interface on an ARM for example has control bits that indicate what type of access it is, is it a non-cacheable access a cacheable, part of a burst, that sort of thing. So you can enable instruction caching independent of data caching and without the mmu on it will pass these control signals straight on through to the cache controller which then is connected to the outside world (if it didnt handle the transaction). So your instruction fetch can be cached everything else not cached. But to cache data ram accesses but not status registers from the serial port what you need to do is setup the tables for the mmu and in your embedded environment you may choose to simply map the ram one to one, meaning address 0x1000 virtual becomes 0x1000 physical, but you can now enable the data cache bit for that chunk of memory. Then for your serial port you can map virtual to physical addresses but you clear the data cache enable bit for that chunk of memory space. Now you can enable the data cache, memory reads are now cached (because the control signals as they pass through the mmu are marked as such, but for your register access the control signals indicate non-cacheable).

You certainly do not have to map virtual to physical one to one, depends on embedded or not embedded, operating system or not, etc. But this is where your protection comes in. Easiest to see in an operating system. An application at the application layer should not be allowed to get at protected system memory, the kernel, etc. Should not be able to clobber fellow applications memory space. So when the application is switched in, the mmu tables reflect what memory it is allowed to access and what memory it is not allowed to access. Any address not permitted by the program is caught by the mmu, an exception/fault (interrupt) is generated and the kernel/supervisor gets control and can deal with that program. You may remember the term "general protection fault" from the earlier windows days, before marketing and other interest groups in the company decided we should change the name, it was straight out of the intel manual, that interrupt was fired when you had a fault that didnt fall into other categories, like a multiple choice question on a test A bob, B ted, C alice, D none of the above. The general protection fault was the none of the above categetory, yet the most widely hit because that is what you got when your program tried to access memory or i/o outside its allocated memory space.

Another benefit from mmus is malloc. Before mmus the memory alloc had to use schemes to re-arrange memory to keep large empty blocks in the middle. for that next big malloc, to minimize the "with 4meg free why did my 1kbyte alloc fail?". Now, like a disk, you chop memory space up into these 4kbyte or some such size chunks. A malloc that is one chunk or less in size, take any free chunk in memory use an mmu table entry to point at it and give the caller the virtual address tied to that mmu entry. You want 4096*10 bytes, the trick is not having to find that much linear memory but finding 10 linear mmu table entries, take any 10 chunks of memory (not neccesarily adjacent) and put their physical addresses in the 10 mmu entries.

The bottom line, "how" it does it is that it sits usually between the processor and the cache or if no cache the physical memory bus. The mmu logic looks at the address, uses that to look into a table. The bits in the table include the physical address plus some control signals which include cacheable, plus some way of indicating if this is a valid entry or a protected region. If that address is protected the mmu fires an interrupt/event back to the core. If valid it modifies the virtual address to become the physical address on the other/outside of the mmu and bits like the cacheable bit are used to tell whatever is on the other side of the mmu what type of transaction this is, instruction, data, cacheable, burst, etc. For an embedded, non-os, single tasking system you may only need a single mmu table. A quick way in an operating system to perform protection for example would be to have a table per application or a subset of the table (which tree like similar to a directory structure) such that when you task switch you only have to change one thing, the start of the table or the start of one branch of the tree to change the virtual to physical addresses and allocated memory (protection) for that branch of the tree.

Ok. Then what about a system without an Operating system to support the MMU? that means how the MMU is defined in an embedded system without OS?

Renjith G 2010-09-30 09:14:00

@Renjith G: Without an OS you would typically configure the MMU into a single address space, often with direct physical=virtual mapping. It is possible that you would configure code space as read-only in order to protect it from accidental or malicious modification. However not having an OS or RTOS implies single thread/process operation, so the ability to protect one thread or process data space from another is not an issue.

Clifford 2010-09-30 16:44:14

Thanks for the comment.

Renjith G 2010-09-30 20:10:29

This really makes the correct sense about the MMU concept.Thanks !

Renjith G 2010-09-30 20:12:38

to add to the thoughts on malloc, because you are in a virtual address space which can be much larger than the amount of physical memory you have you have a better chance at finding linear mmu table entries. It is not a case where you may have 100*4096 bytes of memory in the heap to manage directly with physical access and now you only have 100 mmu table entries in the same system, you may have many hundreds of mmu table entries with which to manage that 100 blocks of physical memory.

dwelch 2010-09-30 20:27:14

ansaurus

tags:

views:

answers:

How MMU(Memory management Unit) unit in a processor protects the memory segments

related questions