views:

341

answers:

6

I was going through some general stuff about operating systems and struck on a question. How will a developer debug when developing an operating system i.e. debug the OS itself? What tools are available to debug for the OS developer?

+1  A: 

You can use a VM: eg. debug ring0 code with bochs/gdb or Debugging NetBSD kernel with qemu

or a serial line with something like KDB.

Draemon
QEMU/KVM supports that too, and are quite a bit more modern and speedy than Bochs :-)
ephemient
thanks - I've added this to my answer
Draemon
+3  A: 

In a bootstrap scenario (OS from scratch), you'd probably have to introduce remote debugging capabilities (memory dumping, logging, etc.) in the OS kernel early on, and use a separate machine. Or you could use a virtual machine/hypervisor.

Windows CE has a component called KITL - Kernel Independent Transport Layer. I guess the title speaks for itslf.

Seva Alekseyev
+1  A: 

printf logging attach to process serious unit tests etc..

Chris Lively
+12  A: 

Debugging a kernel is hard, because you probably can't rely on the crashing machine to communicate what's going on. Furthermore, the codes which are wrong are probably in scary places like interrupt handlers.

There are four primary methods of debugging an operating system of which I'm aware:

  1. Sanity checks, together with output to the screen.

    Kernel panics on Linux (known as "Oops"es) are a great example of this. The Linux folks wrote a function that would print out what they could find out (including a stack trace) and then stop everything.

    Even warnings are useful. Linux has guards set up for situations where you might accidentally go to sleep in an interrupt handler. The mutex_lock function, for instance, will check (in might_sleep) whether you're in an unsafe context and print a stack trace if you are.

  2. Debuggers

    Traditionally, under debugging, everything a computer does is output over a serial line to a stable test machine. With the advent of virtual machines, you can now wire one VM's execution serial line to another program on the same physical machine, which is super convenient. Naturally, however, this requires that your operating system publish what it is doing and wait for a debugger connection. KGDB (Linux) and WinDBG (Windows) are some such in-OS debuggers. VMWare supports this story explicitly.

    More recently the VM developers out there have figured out how to debug a kernel without either a serial line or kernel extensions. VMWare has implemented this in their recent stuff.

    The problem with debugging in an operating system is (in my mind) related to the Uncertainty principle. Interrupts (where most of your hard errors are sure to be) are asynchronous, frequent and nondeterministic. If your bug relates to the overlapping of two interrupts in a particular way, you will not expose it with a debugger; the bug probably won't even happen. That said, it might, and then a debugger might be useful.

  3. Deterministic Replay

    When you get a bug that only seems to appear in production, you wish you could record what happened and replay it, like a security camera. Thanks to a professor I knew at Illinois, you can now do this in a VMWare virtual machine. VMWare and related folks describe it all better than I can, and they provide what looks like good documentation.

    Deterministic replay is brand new on the scene, so thus far I'm unaware of any particularly idiomatic uses. They say it should be particularly useful for security bugs, too.

  4. Moving everything to User Space.

    In the end, things are still more brittle in the kernel, so there's a tremendous development advantage to following the Nucleus (or Microkernel) design, where you shave the kernel-mode components to their bare minimum. For everything else, you can use the myriad of user-space dev tools out there, and you'll be much happier. FUSE, a user-space filesystem extension, is the canonical example of this.

    I like this last idea, because it's like you wrote the program to be writeable. Cyclic, no?

Andres Jaan Tack
Great answer, but in 2 you missed that VMs can also provide hooks into a debugger without using the emulated serial line (or even kernel support).
Draemon
Totally. Good point.
Andres Jaan Tack
QEMU (and thus KVM) provides a GDB stub, which was available many years earlier than VMWare's (at least publicly) -- just run with `-s`. And sometimes you can move the whole kernel in to userspace: Linux has UML, DragonFly BSD has vkernel, both of which allow for easier debugging.
ephemient
Nicely written down, Andres. I would have added in-memory logging to that list. For debugging multi-threaded kernel state machines, I have used per-processor logs (not string logging, but data logs) to debug problems. I have seen string logging being done in circular kernel buffers (without being printed out/stored in afile) as well, for debugging. In a filesystem that I work on, we log the block numbers of blocks which were freed from different code-paths and by whom, to debug filesystem corruption due to software bugs. Cheers.
Sudhanshu
A: 

Remote debugging with kernel debuggers, which can also be done via virtualization.

Max Caceres
A: 

Debugging an operating system is not for the faint of heart. Because the kernel is being debugged, your options would be quite limited. Copious amount of printf statements is one trick, and furthermore, it depends on really what 'operating system' is being debugged, we could be talking about

  • Filesystem
  • Drivers
  • Memory management
  • Raw Disk input/output
  • Screen input/output
  • Kernel

Again, it is a widely varying exercise as in the above, they all interact with one another. Even more complicated is the fact, supposing you were to debug the kernel, how would you do it if the runtime environment is not properly set (by that, I am talking about the kernel's responsibility for loading binary executables).

Some kernels may (not all of them have them) incorporate a simple debug monitor, in fact, if I rightly recall, in the book titled 'Developing your own 32bit Operating System' by Richard A Burgess, Sams publishing, he incorporated a debug monitor which displays various states of the CPU, registers and so on.

Again, take into account of the fact that the binary executables require a certain loading mechanism, for example a gdb equivalent, if the environment for loading binaries are not set up, then your options are quite limited.

By using copious amount of printf statements to display errors, logs etc to a separate terminal or to a file is the best line of debugging, it does sound a nightmare but it would be worth the effort to do so.

Hope this helps, Best regards, Tom.

tommieb75