What is the difference between a thread and a fiber? I've heard of fibers from ruby and I've read heard they're available in other languages, could somebody explain to me in simple terms what is the difference between a thread and a fiber.
In Win32, a fiber is a sort of user-managed thread. A fiber has its own stack and its own instruction pointer etc., but fibers are not scheduled by the OS: you have to call SwitchToFiber explicitly. Threads, by contrast, are pre-emptively scheduled by the operation system. So roughly speaking a fiber is a thread that is managed at the application/runtime level rather than being a true OS thread.
The consequences are that fibers are cheaper and that the application has more control over scheduling. This can be important if the app creates a lot of concurrent tasks, and/or wants to closely optimise when they run. For example, a database server might choose to use fibers rather than threads.
(There may be other usages for the same term; as noted, this is the Win32 definition.)
Threads are scheduled by the OS (pre-emptive). A thread may be stopped or resumed at any time by the OS, but fibers more or less manage themselves (co-operative) and yield to each other. That is, the programmer controls when fibers do their processing and when that processing switches to another fiber.
Threads generally rely on the kernel to interrupt the thread so it or another thread can run (which is better known as Pre-emptive multitasking) whereas fibers use co-operative multitasking where it is the fiber itself that give up the its running time so that other fibres can run.
Some useful links explaining it better than I probably did are:
Threads use pre-emptive scheduling, whereas fibers use cooperative scheduling.
With a thread, the control flow could get interrupted at any time, and another thread can take over. With multiple processors, you can have multiple threads all running at the same time (simultaneous multithreading, or SMT). As a result, you have to be very careful about concurrent data access, and protect your data with mutexes, semaphores, condition variables, and so on. It is often very tricky to get right.
With a fiber, control only switches when you tell it to, typically with a function call named something like yield()
. This makes concurrent data access easier, since you don't have to worry about atomicity of data structures or mutexes. As long as you don't yield, there's no danger of being preempted and having another fiber trying to read or modify the data you're working with. As a result, though, if your fiber gets into an infinite loop, no other fiber can run, since you're not yielding.
You can also mix threads and fibers, which gives rise to the problems faced by both. Not recommended, but it can sometimes be the right thing to do if done carefully.
Threads were originally created as lightweight processes. In a similar fashion, fibers are a lightweight thread, relying (simplistically) on the fibers themselves to schedule each other, by yielding control.
I guess the next step will be strands where you have to send them a signal every time you want them to execute an instruction (not unlike my 5yo son :-). In the old days (and even now on some embedded platforms), all threads were fibers, there was no pre-emption and you had to write your threads to behave nicely.
In the most simple terms, threads are generally considered to be preemptive (although this may not always be true, depending on the operating system) while fibers are considered to be light-weight, cooperative threads. Both are separate execution paths for your application.
With threads: the current execution path may be interrupted or preempted at any time (note: this statement is a generalization and may not always hold true depending on OS/threading package/etc.). This means that for threads, data integrity is a big issue because one thread may be stopped in the middle of updating a chunk of data, leaving the integrity of the data in a bad or incomplete state. This also means that the operating system can take advantage of multiple CPUs and CPU cores by running more than one thread at the same time and leaving it up to the developer to guard data access.
With fibers: the current execution path is only interrupted when the fiber yields execution (same note as above). This means that fibers always start and stop in well-defined places, so data integrity is much less of an issue. Also, because fibers are often managed in the user space, expensive context switches and CPU state changes need not be made, making changing from one fiber to the next extremely efficient. On the other hand, since no two fibers can run at exactly the same time, just using fibers alone will not take advantage of multiple CPUs or multiple CPU cores.
Note that in addition to Threads and Fibers, Windows 7 introduces User-Mode Scheduling:
User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. An application can switch between UMS threads in user mode without involving the system scheduler and regain control of the processor if a UMS thread blocks in the kernel. UMS threads differ from fibers in that each UMS thread has its own thread context instead of sharing the thread context of a single thread. The ability to switch between threads in user mode makes UMS more efficient than thread pools for managing large numbers of short-duration work items that require few system calls.
More information about threads, fibers and UMS is available by watching Dave Probert: Inside Windows 7 - User Mode Scheduler (UMS).