tags:

views:

1255

answers:

10

It makes sense that something like an operating system would be written in C. But how much of it, and what kind of C? I mean, in C, if you needed some heap memory, you would call malloc. But, does an OS even have a heap? As far as I know, malloc asks the operating system for memory and then adds it to a linked list, or binary tree, or something. What about a call stack? The OS is responsible for setting up all of this stuff that other applications use, but how does it do that? When you want to open or create a file in C, the appropriate functions ask the operating system for that file. so... What kind of C is on the other side of that call? Or on the other end of a memory allocation?

Also, how much of an operating system would actually be written in C? All of it? What about architecture dependent code? What about the higher levels of abstraction--does that ever get written in higher level languages, like C++?

I mean, I'm just asking this out of sheer curiosity. I'm downloading the latest linux kernel now but it's taking forever. I'm not sure if I'll wind up being able to follow the code--or if I'll be caught in an inescapably complex web of stuff I've never seen before.

A: 

Traditionally, C is mostly needed for the kernel and device drivers due to interaction with hardware. However, languages such as C++ and Java could be used for the entire operating system

For more information, I've found Operating Systems Design and Implementation by Andrew Tannenbaum particularly useful with LOTS of code samples.

Bo Tian
Explain how you will compile java with no concept of memory addresses to use hardware devices? How you would compile it down to machine code without an intermediate step to compile something to a native language?
Spence
There exist java processors.
swegi
Java can definitely be used for the User-level API -- but, for various reasons, including a lack of low-level constructs like "raw" memory pointers and its garbage-collected nature, it makes a poor choice for implementing the lowest level, the Java Virtual Machine. Sun's JVM is written, interestingly enough, in C++. Supposedly the https://maxine.dev.java.net/design.html maxine project uses pure java with a bit of assembler, and suggests that system programming is now possible with the features of Java 5, but I suspect some extensions and hackery were required.
+1  A: 

I wouldn't start reading the Linux kernel, It's too complicated for starters.

Osdev is an excellent place to start reading. I have done a little os with information from Osdev for an school subject. It runs on vmware, bochs, and qemu so it's easy to test it. Here is the source code.

Macarse
PintOs from stanford is also a good starting point...(replaced nachos)
LB
+22  A: 

What kind of C?

Mostly ANSI C, with a lot of time looking at the machine code it generates.

But, does an OS even have a heap?

Malloc asks the operating system for a pointer to some memory it is allowed to use. If a program running on an OS (user mode) tries to access memory it doesn't own, it will give a segmentation fault. An OS is allowed to directly access all the physical memory on the system, malloc not needed, no seg-faults on any address that exists.

What about a call stack?

The call stack actually often works at the hardware level, with a link register.

For file access, the OS needs access to a disk driver, which needs to know how to read the file system that's on the disk (there are a lot of different kinds) Sometimes the OS has one built in, but I think it's more common that the boot loader hands it one to start with, and it loads another (bigger) one. The disk driver has access to the hardware IO of the physical disk, and builds from that.

Michael Sofaer
Very informative answer.
Gordon Mackie JoanMiro
+4  A: 

But how much of it, and what kind of C?

Some parts must be written in assembly

I mean, in C, if you needed some heap memory, you would call malloc. But, does an OS even have a heap? As far as I know, malloc asks the operating system for memory and then adds it to a linked list, or binary tree, or something.

Some OS's have a heap. At a lowest level, they are slabs of memory that are dolled out called pages. Your C library then partitions with its own scheme in a variable sized manner with malloc. You should learn about virtual memory which is a common memory scheme in modern OS's.

When you want to open or create a file in C, the appropriate functions ask the operating system for that file. so... What kind of C is on the other side of that call?

You call into assembly routines that query hardware with instructions like IN and OUT. With raw memory access sometimes you have regions of memory that are dedicated to communicating to and from hardware. This is called DMA.

I'm not sure if I'll wind up being able to follow the code--or if I'll be caught in an inescapably complex web of stuff I've never seen before.

Yes you will. You should pick up a book on hardware and OS's first.

Unknown
+2  A: 

I mean, in C, if you needed some heap memory, you would call malloc. But, does an OS even have a heap? As far as I know, malloc asks the operating system for memory and then adds it to a linked list, or binary tree, or something. What about a call stack?

A lot of what you say in your question is actually done by the runtime library in userspace.

All that OS needs to do is to load the program into memory and jump to it's entry point, most details after that can be done by the user space program. Heap and stack are just areas of the processes virtual memory. Stack is just a pointer register in the cpu.

Allocating physical memory is something that is done on the OS level. OS usually allocates fixed size pages, which are then mapped to a user space process.

abababa22
+10  A: 

C is a very low level language, and you can do a lot of things directly. Any of the C library methods (like malloc, printf, crlscr etc) need to be implemented first, to invoke them from C (Have a look at libc concepts for example). I'll give an example below.

Let us see how the C library methods are implemented under the hood. We'll go with a clrscr example. When you implement such methods, you'll access system devices directly. For ex, for clrscr (clearing the screen) we know that the video memory is resident at 0xB8000. Hence, to write to screen or to clear it, we start by assigning a pointer to that location.

In video.c

void clrscr()
{

   unsigned char *vidmem = (unsigned char *)0xB8000;
   const long size = 80*25;
   long loop;

   for (loop=0; loop<size; loop++) {
      *vidmem++ = 0;
      *vidmem++ = 0xF;
   }
}

Let us write our mini kernel now. This will clear the screen when the control is handed over to our 'kernel' from the boot loader. In main.c

void main()
{
   clrscr();
   for(;;);
}

To compile our 'kernel', you might use gcc to compile it to a pure bin format.

gcc -ffreestanding -c main.c -o main.o
gcc -c video.c -o video.o
ld -e _main -Ttext 0x1000 -o kernel.o main.o video.o
ld -i -e _main -Ttext 0x1000 -o kernel.o main.o video.o
objcopy -R .note -R .comment -S -O binary kernel.o kernel.bin

If you noticed the ld parameters above, you see that we are specifying the default load location of your Kernel as 0x1000. Now, you need to create a boot loader. From your boot loader logic, you might want to pass control to your Kernel, like

jump 08h:01000h

You normally write your boot loader logic in Asm. Even before that, you may need to have a look at how a PC Boots - Click Here.

Better start with a tinier Operating system to explore. See this Roll Your Own OS Tutorial

http://www.acm.uiuc.edu/sigops/roll_your_own/

amazedsaint
+28  A: 

Excellent questions, all. The answer is: little to none of the standard C library is available in the "dialect" of C used to write an operating system. In the Linux kernel, for example, the standard memory allocation functions malloc, nmalloc, free etc. are replaced with special kernel-internel memory allocation functions kmalloc and kfree, with special restrictions on their use. The operating system must provide its own "heap" -- in the Linux kernel, physical memory pages that have been allocated for kernel use must be non-pageable and often physically continguous. See This linux journal article on kmalloc and kfree. Similarly, the operating system kernel maintains its own special call stack, the use of which requires, from memory, special support from the GCC compiler.

Also, how much of an operating system would actually be written in C? All of it?

As far as I'm aware, operating systems are overwhelmingly written in C. Some architecture-specific features are coded in assembler, but usually very little to improve portability and maintainability: the Linux kernel has some assembler but tries to minimize it as much as possible.

What about architecture dependent code? What about the higher levels of abstraction--does that ever get written in higher level languages, like C++?

Usually the kernel will be written in pure C, but sometimes the higher level frameworks and APIs are written in a higher level language. For example, the Cocoa framework/API on MacOS is written in Objective C, and the BeOS higher level APIs were written in C++. Much of Microsoft's .NET framework was written in C#, with the "Common Language Runtime" written in a mix of C++ and assembler. The QT widget set most often used on Linux is written in C++. Of course, this introduces philosophical questions about what counts as "the operating system."

The Linux kernel is definitely worth looking at for this, although, it must be said, it is huge and intimidating for anyone to read from scratch.

Another very informative answer.
Gordon Mackie JoanMiro
A: 

You should read the Linux Device Drivers 3. It explains pretty well the internals of the linux kernel.

fa.
A: 

malloc and memory management functions aren't keywords in C. This is functions of standard OS libraries. I don't know the name of this standard (it is unlikely that it's POSIX standard - I haven't found any mention), but it's exists - you use malloc in C applications on most platforms.

If you want to know how Linux kernel works I advice this book http://oreilly.com/catalog/9780596005658/ . I think it's good explanation with some C code inserted :).

cetron