tags:

views:

1331

answers:

8

I am reading following article by Robert Love

http://www.linuxjournal.com/article/6916

that says

"...Let's discuss the fact that work queues run in process context. This is in contrast to the other bottom-half mechanisms, which all run in interrupt context. Code running in interrupt context is unable to sleep, or block, because interrupt context does not have a backing process with which to reschedule. Therefore, because interrupt handlers are not associated with a process, there is nothing for the scheduler to put to sleep and, more importantly, nothing for the scheduler to wake up..."

I don't get it. AFAIK, scheduler in the kernel is O(1), that is implemented through the bitmap. So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?

+1  A: 

Because the thread switching infrastructure is unusable at that point. When servicing an interrupt, only stuff of higher priority can execute - See the APIC documentation on interrupt priority. If you did allow another thread to execute (which you imply in your question that it would be easy to do), you wouldn't be able to let it do anything - if it caused a page fault, you'd have to use services in the kernel that are unusable while the interrupt is being serviced (see below for why).

Typically, your only goal in an interrupt routine is to get the device to stop interrupting and queue something at a lower interrupt level (in unix this is typically a non-interrupt level, but for Windows, it's dispatch, apc or passive level) to do the heavy lifting where you have access to more features of the kernel/os. See - Implementing a handler.

EDIT: It's a property of how O/S's have to work, not something inherent in Linux. An interrupt routine can execute at any point so the state of what you interrupted is inconsistent. If you interrupted the thread scheduling code, its state is inconsistent so you can't be sure you can "sleep" and switch threads. Even if you protect the thread switching code from being interrupted, thread switching is a very high level feature of the O/S and if you protected everything it relies on, an interrupt becomes more of a suggestion than the imperative implied by its name.

EDIT2: Removed the use of the word "shutdown" as it implies to the readers some action is taken by the O/S to do this. Added links to more authoritative references to justify my response.

Tony Lee
What do you mean by thread switching infrastructure is shutdown?Is this just a theoretical knowledge or can you give me reference to actual code in kernel that does that to support your claim?
Methos
It's a theoretical view of what an interrupt is - logically if you sleep, that only benefits you if you can switch threads, but you can't. And if you did, the whole system would lock up anyway because you can't get very far w/o hitting some service that can't function. If you can interrupt the paging system at any point, why would you think it could be reentered and have it still function? It would nearly impossible to code an o/s if that was required.
Tony Lee
By shutdown, I mean can't function - the execution quantum can never expire and a new thread will never be scheduled as long as you're servicing the interrupt.
Tony Lee
+1 - Excellent points. You say "It's a property of how O/S's have to work." I would go further and say "It's the way the hardware works." The operating system has to live with it.
Keith Smith
"Even if you protect the thread...implied by its name."Not agreeing. You are saying that whenever there is interrupt, an OS has to leave everything else and attend it first and its not a good idea to mask interrupts because then interrupt is not an interrupt but a suggestion. AFAIK an interrupt is always a suggestion/indication that someone needs attention. Its upto the OS, when to handle it. The reason its called interrupt is because this can happen at any time that is in between processor clock cycle.
Methos
An alternative to interrupts would be polling which will happen only when processor wants to poll that is they will happen on clear cut clock cycle boundaries. (also polling requires additional and explicit cycles from processor)
Methos
I'm not following your comments completely. It's an interrupt because the processor will stop what it's doing - even roll back the current instruction, and switch execution context. It's the switching part that makes it an interrupt. Yes you can mask, but that makes for a poor O/S that has to do real time processing (e.g., skipping audio). The article I reference for handling interrupts makes it clear using CLI is deprecated (and doesn't work so well with multiple processors since the other processor is still running)
Tony Lee
What you said makes sense. Sorry that my earlier comment is confusing.
Methos
+6  A: 

So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?

The problem is that the interrupt context is not a process, and therefore cannot be put to sleep.

When an interrupt occurs, the processor saves the registers onto the stack and jumps to the start of the interrupt service routine. This means that when the interrupt handler is running, it is running in the context of the process that was executing when the interrupt occurred. The interrupt is executing on that process's stack, and when the interrupt handler completes, that process will resume executing.

If you tried to sleep or block inside an interrupt handler, you would wind up not only stopping the interrupt handler, but also the process it interrupted. This could be dangerous, as the interrupt handler has no way of knowing what the interrupted process was doing, or even if it is safe for that process to be suspended.

A simple scenario where things could go wrong would be a deadlock between the interrupt handler and the process it interrupts.

  1. Process1 enters kernel mode.
  2. Process1 acquires LockA.
  3. Interrupt occurs.
  4. ISR starts executing using Process1's stack.
  5. ISR tries to acquire LockA.
  6. ISR calls sleep to wait for LockA to be released.

At this point, you have a deadlock. Process1 can't resume execution until the ISR is done with its stack. But the ISR is blocked waiting for Process1 to release LockA.

Keith Smith
Also, interrupts usually require very fast servicing, or you can easily get into all sorts of trouble.
nos
OK..there are two arguments in your claim:1. "If you tried to sleep or block .... or even if it is safe for that process to be suspended."I totally not buy this argument. First of all kernel does not care what user space process is doing or if its safe to suspend it. Furthermore, with preemptive kernel, its even possible to block a kernel thread and start another.2. "In the worst case, blocking from an interrupt handler could cause deadlocks"This is a locking issue. What if my ISR releases all locks before calling sleep?
Methos
@Methos - Re 1. The problem is when you interrupt a process in kernel mode, not one that is in user mode. If you interrupt a kernel thread and let the handler block, it wouldn't be the same as a normal thread preemption, because you would be simultaneously preempting two unrelated contexts, the kernel thread, and the ISR. If there are dependencies between them, you're dead. Hence my example about the kernel thread holding a resource that the ISR needs. The kernel thread won't be able to continue executing until the ISR completes. But the ISR is waiting for the kernel thread. Deadlock.
Keith Smith
@Methos - Re 2. In my example, its a kernel-mode process that holds the lock. I'll edit my answer to provide a clearer explanation.Note that the ISR can't release locks before calling sleep because the ISR can't acquire locks in the first place. If you try to acquire a lock, you might block, which is just as bad as directly calling sleep.
Keith Smith
Keith, I still don't agree with you. Locks have got nothing to do with this issue (although the example that you gave shows a classic deadlock situation). I have commented in arsane's answer about what I think.
Methos
A: 

So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?

Scheduling happens on timer interrupts. The basic rule is that only one interrupt can be open at a time, so if you go to sleep in the "got data from device X" interrupt, the timer interrupt cannot run to schedule it out.

Interrupts also happen many times and overlap. If you put the "got data" interrupt to sleep, and then get more data, what happens? It's confusing (and fragile) enough that the catch-all rule is: no sleeping in interrupts. You will do it wrong.

Andres Jaan Tack
I dont agree that "basic rule is only one interrupt". A interrupts can be nested. Please refer to bovet cesati, Chapter 4.3, "Nested execution of Exception and Interrupt Handlers"
Methos
You have a good point, but note the next paragraph (they can overlap, what you call nested). It's a "basic" rule because if you do anything otherwise, you'd better know what's going on.
Andres Jaan Tack
+1  A: 

I think it's a design idea.

Sure, you can design a system that you can sleep in interrupt, but except to make to the system hard to comprehend and complicated(many many situation you have to take into account), that's does not help anything. So from a design view, declare interrupt handler as can not sleep is very clear and easy to implement.


From Robert Love (a kernel hacker): http://mail.nl.linux.org/kernelnewbies/2003-07/msg00026.html

You cannot sleep in an interrupt handler because interrupts do not have a backing process context, and thus there is nothing to reschedule back into. In other words, interrupt handlers are not associated with a task, so there is nothing to "put to sleep" and (more importantly) "nothing to wake up". They must run atomically.

This is not unlike other operating systems. In most operating systems, interrupts are not threaded. Bottom halves often are, however.

The reason the page fault handler can sleep is that it is invoked only by code that is running in process context. Because the kernel's own memory is not pagable, only user-space memory accesses can result in a page fault. Thus, only a few certain places (such as calls to copy_{to,from}_user()) can cause a page fault within the kernel. Those places must all be made by code that can sleep (i.e., process context, no locks, et cetera).

arsane
I kind of came to similar conclusion. But I am not sure how to back this claim. Just wanted to find if there is any "mathematical" reason to not do so.
Methos
I don't know how you would prove, in the "mathematical" sense that it's impossible to build a system that allows an ISR to sleep. But I've programmed inside a number of OSs, and none of them allowed this.In practice, the closest I've ever seen to allowing an interrupt handler to do things like sleep is to have an explicit process that does the work of handling interrupts. But the system's I've seen that do this (e.g., Solaris) still have a minimal ISR that isn't allowed to do things like sleep. All it does is to wakeup the interrupt thread and let it do the real work.
Keith Smith
@Keith, this problem seems no answer from authority, though I think it is possible to design a system that ISR can sleep. Here I attached Robert Love's answer for this question, but from my view, I think it's a design idea.
arsane
@arsane - What do you mean when you say, "It's a design idea?"
Keith Smith
@Keith, as for my opinion, I think this is system design strategy for forbidding ISR sleep to make system design simple and clear.
arsane
@arsane I accept your answer that its a design idea and thanks for poiniting out that mailing thread (was actually nice to see someone had exactly same query). However, there is missing piece of information that both yours and Robert's explanations have not mentioned. Let me fill it.. ISR cannot sleep as they run in interrupt context and there is no backing process context. There is nothing there to reschedule back to because only *process contexts* are schedulable in linux and *this* is a design choice
Methos
Just to mention to anyone who refers to this in future, a PF is an exception.
Methos
By calling it a design choice you miss why an ISR has no backing process context. It does you no good to give it one. When an O/S takes an interrupt, the state of the O/S is undefined because it can occur at nearly any point. Undefined state means while in an ISR the O/S is unusable to do things like schedule threads. Resources shared by an ISR and the O/S are protected by spin locks so the state of those resources will be consistent while in an ISR. See linuxjournal.com/article/5833 about spinlocks. If you used normal locks, ones that didn't disable interrupts, you'd deadlock instead of crash
Tony Lee
Its true that we need to use spinlocks and more than that to use spinlock_irq_save. Because, if ISR1 acquires a spinlock 1 and is again interrupted by ISR2 that tries to acquire spinlock 1, the processor will dead lock (I actually tried something similar. Surprisingly kernel detects soft lock up after 11s). I do not understand what do you mean by "When an O/S takes ... is unusable to do things like schedule threads".
Methos
A: 

High-level interrupt handlers mask the operations of all lower-priority interrupts, including those of the system timer interrupt. Consequently, the interrupt handler must avoid involving itself in an activity that might cause it to sleep. If the handler sleeps, then the system may hang because the timer is masked and incapable of scheduling the sleeping thread. Does this make sense?

A: 

Even if you could put an ISR to sleep, you wouldn't want to do it. You want your ISRs to be as fast as possible to reduce the risk of missing subsequent interrupts.

Ryan Fox
A: 

If a higher-level interrupt routine gets to the point where the next thing it must do has to happen after a period of time, then it needs to put a request into the timer queue, asking that another interrupt routine be run (at lower priority level) some time later.

When that interrupt routine runs, it would then raise priority level back to the level of the original interrupt routine, and continue execution. This has the same effect as a sleep.

John Saunders
A: 

The linux kernel has two ways to allocate interrupt stack. One is on the kernel stack of the interrupted process, the other is a dedicated interrupt stack per CPU. If the interrupt context is saved on the dedicated interrupt stack per CPU, then indeed the interrupt context is completely not associated with any process. The "current" macro will produce an invalid pointer to current running process, since the "current" macro with some architecture are computed with the stack pointer. The stack pointer in the interrupt context may point to the dedicated interrupt stack, not the kernel stack of some process.

Shijun Zhou