tags:

views:

998

answers:

4

Sometimes whenever I write a program in Linux and it crashes due to a bug of some sort, it will become an uninterruptable process and continue running forever until I restart my computer (even if I log out). My questions are:

  • What causes a process to become uninterruptable?
  • How do I stop that from happening?
  • This is probably a dumb question, but is there any way to interrupt it without restarting my computer?
A: 

Could you please describe what and "uninterruptable process" is? Does it survives the "kill -9 " and happily chugs along? If that is the case, then it's stuck on some syscall, which is stuck in some driver, and you are stuck with this process till reboot (and sometimes it's better to reboot soon) or unloading of relevant driver (which is unlikely to happen). You could try to use "strace" to find out where your process is stuck and avoid it in the future.

But if you are talking about a "zombie" process (which is designated as "zombie" in ps output), then this is a harmless record in the process list waiting for someone to collect its return code and it could be safely ignored.

ADEpt
+13  A: 

An uninterruptable process is a process which happens to be in a system call (kernel function) that cannot be interrupted by a signal.

To understand what that means, you need to understand the concept of an interruptable system call. The classic example is read(). This is a system call that can take a long time (seconds) since it can potentially involve spinning up a hard drive, or moving heads. During most of this time, the process will be sleeping, blocking on the hardware.

While the process is sleeping in the system call, it can receive a unix asynchronous signal (say, SIGTERM), then the following happens:

  • The system calls exits prematurely, and is set up to return -EAGAIN to userspace.
  • The signal handler is executed.
  • If the process is still running, it gets the return value from the system call, and if it is written correctly it will make the same call again.

The crux of the issue is that (for some reason I do not really understand), the execution needs to get out of the system call for the userspace signal handler to run.

On the other hand, some system calls are not allowed to be interrupted in this way. If the system calls stalls for some reason, the process can indefinitely remains in this unkillable state.

LWN ran a nice article that touched this topic in July.

To answer the original question:

  • How to prevent this from happening: figure out which driver is causing you trouble, and either stop using, or become a kernel hacker and fix it.

  • How to kill an uninterruptible process without rebooting: somehow make the system call terminate. Frequently the most effective manner to do this without hitting the power switch is to pull the power chord. You can also become a kernel hacker and make the driver use TASK_KILLABLE, as explained in the LWN article.

ddaa
+1  A: 
CesarB
+1  A: 

Uninterruptable processes are USUALLY waiting for I/O following a page fault.

Consider this:

  • The thread tries to access a page which is not in core (either an executable which is demand-loaded, a page of anonymous memory which has been swapped out, or a mmap()'d file which is demand loaded, which are much the same thing)
  • The kernel is now (trying to) load it in
  • The process can't continue until the page is available.

The process/task cannot be interrupted in this state, because it can't handle any signals; if it did, another page fault would happen and it would be back where it was.

When I say "process", I really mean "task", which under Linux (2.6) roughly translates to "thread" which may or may not have an individual "thread group" entry in /proc

In some cases, it may be waiting for a long time. A typical example of this would be where the executable or mmap'd file is on a network filesystem where the server has failed. If the I/O eventually succeeds, the task will continue. If it eventually fails, the task will generally get a SIGBUS or something.

MarkR