views:

1526

answers:

6

In Linux, what happens to the state of a process when it needs to read blocks from a disk? Is it blocked? If so, how is another process chosen to execute?

A: 

Assuming your process is a single thread, and that you're using blocking I/O, your process will block waiting for the I/O to complete. The kernel will pick another process to run in the meantime based on niceness, priority, last run time, etc. If there are no other runnable processes, the kernel won't run any; instead, it'll tell the hardware the machine is idle (which will result in lower power consumption).

Processes that are waiting for I/O to complete typically show up in state D in, e.g., ps and top.

derobert
+7  A: 

While waiting for read() or write() to/from a file descriptor return, the process will be put in a special kind of sleep, known as "D" or "Disk Sleep". This is special, because the process can not be killed or interrupted while in such a state. A process waiting for a return from ioctl() would also be put to sleep in this manner.

An exception to this is when a file (such as a terminal or other character device) is opened in O_NONBLOCK mode, passed when its assumed that a device (such as a modem) will need time to initialize. However, you indicated block devices in your question. Also, I have never tried an ioctl() that is likely to block on a fd opened in non blocking mode (at least not knowingly).

How another process is chosen depends entirely on the scheduler you are using, as well as what other processes might have done to modify their weights within that scheduler.

Some user space programs under certain circumstances have been known to remain in this state forever, until rebooted. These are typically grouped in with other "zombies", but the term would not be correct as they are not technically defunct.

Tim Post
If a process gets stuck in D state, it's either a hardware issue or a kernel bug, not a userspace bug.
caf
@caf FUSE was actually fresh in mind (in particular, ioctls on a fusemount) when I said that. I'll edit the question to take out 'buggy' though.
Tim Post
A: 

Yes, tasks waiting for IO are blocked, and other tasks get executed. Selecting the next task is done by the Linux scheduler.

Martin v. Löwis
A: 

Generally the process will block. If the read operation is on a file descriptor marked as non-blocking or if the process is using asynchronous IO it won't block. Also if the process has other threads that aren't blocked they can continue running.

The decision as to which process runs next is up to the scheduler in the kernel.

Benno
+4  A: 

The state of a process performing IO will be put in D state(uninterruptable sleep), which frees the CPU until there is a hardware interrupt which tells the CPU to return to executing the program. You can man ps to see the other process states.

Depending on your kernel, there is a process scheduler which keeps track of a runqueue of processes ready to execute. This along with a scheduling algorithm tells the kernel which process to assign to which CPU. Also there are kernel processes and user processes to consider. And each process is allocated a time-slice, which is a chunk of CPU time it is allowed to use. Once the process uses all of its time-slice it is marked as expired and given lower priority in the scheduling algorithm.

In the case of the 2.6 kernel, there is a O(1) complexity scheduler, so no matter how many processes you have up running it will assign CPUs in constant time. It is more complicated though, since 2.6 introduced preemption and CPU load balancing its not an easy algorithm. In any case, its efficient and CPUs will not remain idle while you wait for IO. Hope that helps!

Hayato
+1  A: 

Yes, the task gets blocked in the read() system call. Another task which is ready runs, or if no other tasks are ready, the idle task (for that CPU) runs.

A normal, blocking disc read causes the task to enter the "D" state (as others have noted). Such tasks contribute to the load average, even though they're not consuming the CPU.

Some other types of IO, especially ttys and network, do not behave quite the same - the process ends up in "S" state and can be interrupted and doesn't count against the load average.

MarkR