views:

393

answers:

3

A linux machine freezes few hours after booting and running software (including custom drivers). I'm looking a method to debug such problem. Recently, there has been significant progress in Linux Kernel debugging techniques, hasn't it?

I kindly ask to share some experience on the topic.

+2  A: 

SystemTap seems to be to Linux what Dtrace is to Solaris .. however I find it rather hostile to use. Still, you may want to give it a try. NB: compile the kernel with debug info and spend some time with the kernel instrumentation hooks.

This is why so many are still using printk() after empirically narrowing a bug down to a specific module.

I'm not recommending it, just pointing out that it exists. I may not be smart enough to appreciate some underlying beauty .. I just write drivers for odd devices.

Tim Post
+1 for the reference to SystemTrap. Looks promising. I am one of these printk guys.
dmeister
A: 

There are many and varied techniques depending on the sort of problems you want to debug. In your case the first question is "is the system really frozen?". You can enable the magic sysrq key and examine the system state at freeze and go from there.

Probably the most directly powerful method is to enable the kernel debugger and connect to it via a serial cable.

stsquad
+1  A: 

If you can reproduce the problem inside a VM, there is indeed a fairly new (AFAIK) technique which might be useful: debugging the virtual machine from the host machine it runs on.

See for example this: http://stackoverflow.com/questions/2129344/debugging-linux-kernel-in-vmware-with-windows-host/2255666#2255666

VMware Workstation 7 also enables a powerful technique that lets you record system execution deterministically and then replay it as desired, even backwards. So as soon as the system crashes you can go backwards and see what was happening then (and even try changing something and see if it still crashes). IIRC I read somewhere you can't do this and debug the kernel using VMware/gdb at the same time.

Obviously, you need a VMM for this. I don't know what VMM's other than VMware's VMM family support this, and I don't know if any free VMware versions support this. Likely not; one can't really expect a commercial company to give away everything for free. The trial version is 30 days.

If your custom drivers are for hardware inside the machine, then I suppose this probably won't work.

Paul
Note this doesn't require enabling any sort of debug support inside the VM; the VM itself is undisturbed and theoretically completely unaware it's being debugged. Instead, workstation has a gdb stub built directly into it that controls the virtualized CPU.
Paul