views:

198

answers:

3

The title says it all. Imagine I have two (three, four, whatever) tasks that have to run in parallel. Now, the easy way to do this would be to create separate threads and forget about it. But on a plain old single-core CPU that would mean a lot of context switching - and we all know that context switching is big, bad, slow, and generally simply Evil. It should be avoided, right?

On that note, if I'm writing the software from ground up anyway, I could go the extra mile and implement my own task-switching. Split each task in parts, save the state inbetween, and then switch among them within a single thread. Or, if I detect that there are multiple CPU cores, I could just give each task to a separate thread and all would be well.

The second solution does have the advantage of adapting to the number of available CPU cores, but will the manual task-switch really be faster than the one in the OS core? Especially if I'm trying to make the whole thing generic with a TaskManager and an ITask, etc?

Clarification: I'm a Windows developer so I'm primarily interested in the answer for this OS, but it would be most interesting to find out about other OSes as well. When you write your answer, please state for which OS it is.

More clarification: OK, so this isn't in the context of a particular application. It's really a general question, the result on my musings about scalability. If I want my application to scale and effectively utilize future CPUs (and even different CPUs of today) I must make it multithreaded. But how many threads? If I make a constant number of threads, then the program will perform suboptimally on all CPUs which do not have the same number of cores.

Ideally the number of threads would be determined at runtime, but few are the tasks that can truly be split into arbitrary number of parts at runtime. Many tasks however can be split in a pretty large constant number of threads at design time. So, for instance, if my program could spawn 32 threads, it would already utilize all cores of up to 32-core CPUs, which is pretty far in the future yet (I think). But on a simple single-core or dual-core CPU it would mean a LOT of context switching, which would slow things down.

Thus my idea about manual task switching. This way one could make 32 "virtual" threads which would be mapped to as many real threads as is optimal, and the "context switching" would be done manually. The question just is - would the overhead of my manual "context switching" be less than that of OS context switching?

Naturally, all this applies to processes which are CPU-bound, like games. For your run-of-the-mill CRUD application this has little value. Such an application is best made with one thread (at most two).

+1  A: 

Single-core Windows machines are going to become extinct in the next few years, so I generally write new code with the assumption that multi-core is the common case. I'd say go with OS thread management, which will automatically take care of whatever concurrency the hardware provides, now and in the future.

I don't know what your application does, but unless you have multiple compute-bound tasks, I doubt that context switches are a significant bottleneck in most applications. If your tasks block on I/O, then you are not going to get much benefit from trying to out-do the OS.

Kristopher Johnson
Naturally this applies only to the CPU-bound processes. I didn't mean anything else. But even when you have a CPU-bound process and you are aiming for a multi-core CPU, how do you write your process so that it is guaranteed to optimally utilize all the cores that are available, whether there are 2 or 32?
Vilx-
+1  A: 

The only advantage of manual switch that I can see is that you have better control of where and when the switch happens. The ideal place is of course after a unit of work has been completed so that you can trash it all together. This saves you a cache miss.

I advise not to spend your effort on this.

bitc
A: 

I don't see how a manual task switch could be faster since the OS kernel is still switching other processes, including yours in out of the running state too. Seems like a premature optimization and a potentially huge waste of effort.

If the system isn't doing anything else, chances are you won't have a huge number of context switches anyway. The thread will use its timeslice, the kernel scheduler will see that nothing else needs to run and switch right back to your thread. Also the OS will make a best effort to keep from moving threads between CPUs so you benefit there with caching.

If you are really CPU bound, detect the number of CPUs and start that many threads. You should see nearly 100% CPU utilization. If not, you aren't completely CPU bound and maybe the answer is to start N + X threads. For very IO bound processes, you would be starting a (large) multiple of the CPU count (i.e. high traffic webservers run 1000+ threads).

Finally, for reference, both Windows and Linux schedulers wake up every millisecond to check if another process needs to run. So, even on an idle system you will see 1000+ context switches per second. On heavily loaded systems, I have seen over 10,000 per second per CPU without any significant issues.

AngerClown
In other words, even if I made like 32 CPU-bound threads on a single-core system, the slowdown compared to a single-threaded solution would be negligible?
Vilx-
I don't think the overhead would be negligible, I just meant that a CPU can handle a large number of context switches and still get work done. You would need to run some tests to determine the overhead and if you are ok with it. It dependd on how long this process runs - 1ms of overhead for a task that takes 10 seconds, probably doesn't matter; maybe 1 second of overhead would. It's pretty easy to get the number of CPUs on a system; you already have an algorithm that's easy to beak apart, so _any_ overhead may not be acceptable and you should just run single threaded on a single core system.
AngerClown