Fine grained multithreading - how much should a worker task do?

views:

answers:

+2 Q:

Fine grained multithreading - how much should a worker task do?

I'm using the work_pile pattern so the threads are always running and waiting on a semaphore for incoming new function pointers + data in a queue. Thats what the apple marketing guys now calls Grand Central Dispatch and promote as the new sliced bread thing.

I just wonder how to find out if it is usefull to split a short task into two even shorter ones. Is there rule on which i could judge if it is worth queuing a new object?

+1 A:

Two possible answers:

It depends.
Benchmark it.

I prefer the second one.

Anyway, if two tasks are always running one after the other (i.e., sequentially), I suppose that there is no gain to split them.

mouviciel 2009-08-26 12:39:25

Benchmarking is not easy and you can do it only after you have implemented it. And i would like to have a checklist when i should start looking into a split. I don't even know how much a thread activation might cost in the average, does it takes it nanoseconds, microseconds? If i queue multiple steps of an algorithm then without there is just a semaphore counter decrement and a mutex, say 100 cycles + cache misses?

Lothar 2009-08-26 13:49:26

I would add that an essential factor is also the number of CPUs/cores and how tasks are assigned to them. Though the theory is that you can abstract from the number of available cores, I'm pretty sure that only benchmarking can precisely answer to your question.

mouviciel 2009-08-26 14:33:13

@mouvicel: The limitation of benchmarking is that it only gives you the answer to the specific question, and doesn't necessarily offer much insight into how to tune for best performance under slightly different circumstances. This is why we'd prefer to have some theoretical insight which can then be used for auto-tuning.

Steven Sudit 2009-08-27 14:10:35

+1 A:

The limit on multitasking is how many cores you have and how much of the algorithm is concurrent. Various types of overhead, including locking, can reduce the amount of concurrency, lowering or even reversing the benefit of multitasking. That's why it works best when there are independent, long-running tasks. Having said that, so long as the overhead doesn't swallow the performance gains, it pays to divide even a short task up among cores.

Steven Sudit 2009-08-26 12:45:53

The idea behind the work pile pattern is that you can abstract from the number of available cores.

Lothar 2009-08-26 13:34:49

Right, for CPU-bound activities, you would likely set up n+1 threads to read from the input queue, where n is the number of cores. For I/O-bound ones, you would likely want more.

Steven Sudit 2009-08-26 16:50:18

Ok, I did more research on it and it sounds like the OS is taking responsibility for the number of threads in the pool. I guess that's handy, but I'm not sure how it decides to tune things. The n+1 heuristic fails badly when I/O blocks.

Steven Sudit 2009-08-26 20:27:46

Right but since the OS news when IO blocks it can just assign a new thread to the thread pool. It balances n+1 dynamically with all the wisdom that only a supervisor can have. Mabe apple is right and this is the new sliced bread thing.

Lothar 2009-08-27 08:06:08

I admit it sounds interesting, though I'm not sure that it requires OS intervention, as such. I'm also not sure that adding a thread whenever another blocks is quite correct. Consider the case of a pipeline full of I/O-bound functions, where such an algorithm would quickly lead to dozens of threads waking up at around the same time and finding themselves thrashing. I need to give this more thought.

Steven Sudit 2009-08-27 14:09:07

+1 A:

The short answer is that you need to think about resources + workload + benchmarking.

Here are some of the ways things could break down:

Do you have idle threads? Is the workload chunky enough that a thread takes so long to complete that another thread is hanging out waiting for re-assignment (i.e., more threads than work)?
Do you have enough work? Is the overall task completed so quickly that it's not worth thinking about additional threads? Remember that increasing multithreading does increase overhead by some (sometimes) small but measurable amount.
Do you have available resources? Do you have more threads to give? Do you have CPU cycles that are sitting idle?

So, in short, I'd say that you need to think before you type. If you already have code that works at all, that's like money in the bank. Is it worth investing more of your time to increase the productivity of that code or would the return on investment be too low (or negative!)?

Bob Cross 2009-08-26 12:59:35

ansaurus

tags:

views:

answers:

Fine grained multithreading - how much should a worker task do?

related questions