views:

1088

answers:

2

I'm working on kernel design, and I've got some questions concerning paging.

The basic idea that I have so far is this: Each program gets its own (or so it thinks) 4G of memory, minus a section somewhere that I reserve for kernel functions that the program can call. So, the OS needs to figure out some way to load the pages in memory that the program needs to use during its operation.

Now, assuming that we had infinite amounts of memory and processor time, I could load/allocate any page the program wrote to or read from as it happened using page faults for pages that didn't exist (or were swapped out) so the OS could quickly allocate them or swap them in. In the real world though, I need to optimize this process, so that we don't have a program constantly consuming all memory that it ever touched.

So I guess my question is, how does an OS generally go about this? My initial thought is to create a function that the program calls to set/free pages, which it can then memory manage on its own, but does a program generally do this, or does the compiler assume it has free reign? Also, how does the compiler handle situations where it needs to allocate a fairly large segment of memory? Do I need to provide a function that tries to give it X pages in order?

This is obviously not a language specific question, but I'm partial to standard C and good with C++, so I'd like any code examples to be in either that or assembly. (Assembly shouldn't be necessary, I fully intend to make it work with as much C code as possible, and optimize as a last step.)

Another thing that should be easier to answer as well: How does one generally handle kernel functions that a program needs to call? Is it OK just to have a set area of memory (I was thinking toward the end of virtual space) that contains most basic functions/process specific memory that the program can call? My thought from there would be to have the kernel functions do something very fancy and swap the pages out (so that programs couldn't see sensitive kernel functions in their own space) when programs needed to do anything major, but I'm not really focusing on security at this point.

So I guess I'm more worried about the general design ideas than the specifics. I'd like to make the kernel completely compatible with GCC (somehow) and I need to make sure that it provides everything that a normal program would need.

Thanks for any advice.

+5  A: 

The answer to this question is highly architecture-dependent. I'm going to assume you're talking about x86. With x86, a kernel generally provides a set of system calls, which are predetermined entry points into the kernel. User code can only enter the kernel at those specific points, so the kernel has careful control over how it interacts with user code.

In x86, there are two ways to implement system calls: with interrupts, and with the sysenter/sysexit instructions. With interrupts, the kernel sets up an interrupt descriptor table (IDT), which defines the possible entry points into the kernel. User code can then use the int instruction to generate a soft interrupt to call into the kernel. Interrupts can also be generated by hardware (so-called hard interrupts); those interrupts should generally be distinct from soft interrupts, but they don't have to be.

The sysenter and sysexit instructions are a faster way of performing system calls, since handling interrupts is slow; I'm not that familiar with using them, so I can't comment on whether or not they're a better choice for your situation.

Whichever you use, you'll have to define the system call interface. You'll probably want to pass system call arguments in registers and not on the stack, since generating an interrupt will cause you to switch stacks to the kernel stack. This means you'll almost certainly have to write some assembly language stubs on both the user-mode end to make the system call, and again on the kernel end to gather the system call arguments and save the registers.

Once you have all of that in place, you can start thinking about handling page faults. Page faults are effectively just another type of interrupt - when user code tries to access a virtual address for which there is no page table entry, it will generate interrupt 14, and you'll also get the faulting address as an error code. The kernel can take this information and then decide to read in the missing page from disk, add the page table mapping, and jump back into user code.

I highly recommend you take a look at some of the materials from the MIT Operating Systems class. Check out the references section, it's got loads of good stuff.

Adam Rosenfield
+8  A: 

A good starting point for all these questions is to look at how Unix does it. As a famous quote says, "Those who don't understand UNIX are doomed to reinvent it, poorly."

First, about calling kernel functions. It is not enough to simply have the functions somewhere a program can call, since the program is most probably running in "user mode" (ring 3 on IA-32) and the kernel has to run in "kernel mode" (usually ring 0 on IA-32) to do its priviledged operations. You have to somehow do the transition between both modes, and this is very architecture specific.

On IA-32, the traditional way is to use a gate in the IDT together with a software interrupt (Linux uses int 0x80). Newer processors have other (faster) ways to do it, and which ones are available depends on whether the CPU is from AMD or Intel, and on the specific CPU model. To accomodate this variation, recent Linux kernels use a page of code mapped by the kernel at the top of the address space for every process. So, on recent Linux, to do a system call you call a function on this page, which will in turn do whatever is needed to switch to kernel mode (the kernel has more than one copy of that page, and choses which copy to use on boot depending on your processor's features).

Now, the memory management. This is a huge subject; you could write a large book about it and not exaust the subject.

Be sure to keep in mind that there are at least two views of the memory: the physical view (the real order of the pages, visible to the hardware memory subsystem and often to external peripherals) and the logical view (the order of the pages seen by programs running on the CPU). It's quite easy to confuse both. You will be allocating physical pages and assigning them to logical addresses on the program or kernel address space. A single physical page can have several logical addresses, and can be mapped to different logical addresses in different processes.

The kernel memory (reserved for the kernel) is usually mapped at the top of the address space of every process. However, it is set up so it can only be acessed on kernel mode. There is no need for fancy tricks to hide that portion of memory; the hardware does all the work of blocking the access (on IA-32, it is done via page flags or segment limits).

The programs allocate memory on the rest of the address space in several ways:

  • Part of the memory is allocated by the kernel's program loader. This includes the program code (or "text"), the program initialized data ("data"), the program uninitialized data ("bss", zero-filled), the stack, and several odds and ends. How much to allocate, where, what should be the initial contents, which protection flags to use, and several other things, are read from the headers on the executable file to be loaded.
  • Traditionally on Unix, there is an area of memory which can grow and shrink (its upper limit can be changed via the brk() system call). This is traditionally used by the heap (the memory allocator on the C library, of which malloc() is one of the interfaces, is responsible for the heap).
  • You often can ask the kernel to map a file to an area of address space. Reads and writes to that area are (via paging magic) directed to the backing file. This is usually called mmap(). With an anonymous mmap, you can allocate new areas of the address space which are not backed by any file, but otherwise act the same way. The kernel's program loader will often use mmap to allocate parts of the program code (for instance, the program code can be backed by the executable itself).

Acessing areas of the address space which are not allocated in any way (or are reserved for the kernel) is considered an error, and on Unix will cause a signal to be sent to the program.

The compiler either allocates memory statically (by specifying it on the executable file headers; the kernel's program loader will allocate the memory when loading the program) or dynamically (by calling a function on the language's standard library, which usually then calls a function in the C language standard library, which then calls the kernel to allocate memory and subdivides it if necessary).

The best way to learn the basics of all this is to read one of the several books on operating systems, in particular the ones which use a Unix variant as an example. It will go in way more detail than I could on an answer on StackOverflow.

CesarB
Oh... so basically the C language assumes that its "heap" is one long address space, and will manage that memory for itself, and all the kernel has to do is change how big that memory is? That sounds fairly simple actually, and makes it easy to measure the memory consumed by a program as well.
Nicholas Flynt
Not the C language, but a traditional implementation of its standard library. The C standard also allows for more complicated implementations; how the standard library manages its memory is mostly an implementation detail, as far as the C standard is concerned.
CesarB