API Design for Task/Parallelization Library

I've just completed a significant revision of my task pool/parallelization library for the D programming language. I'm interested in having the API critiqued, especially by people who are not regular users of D, but know a decent amount about use cases for such a library. I'd like to avoid the groupthink that would be created by asking the opinion of only the relatively small community of hardcore D users for their opinions.

Do you think that the API is engineered to the right level, i.e. not ridiculously over- or underengineered?
Do you think the documentation is clear enough that someone who is not already a D guru could figure out how to use it?
Do you think there are any major missing features that should be added or useless features that should be removed?
Do you think this is overall a good design?

Disclaimer: toying around with D 2.0 for maybe 10 days (tick "not regularly using D"). I consider this an opportunity to learn something about D.

Regarding 1 and 2: Easy to read and understand (writeln("Sum = ", myFuture.spinWait()); in the example should probably be writeln("Sum = ", myTask.spinWait());).

Regarding 3: a parallel prefix would be nice. And I don't know enough about D, but I guess mutexes are defined somewhere else.

Regarding 4: your design seems to indicate that you have worker pool, start up a couple of threads, and threads then steal tasks from this pool. Now, I have debugged my share of bottlenecks (mostly of my own making). Besides NUMA and "judiciously" serializing things with the help of mutexes, pools can also be very "successful" at serializing your program and introducing overhead. I understand that the API does not prevent a good implementation. Just makes me wonder: why are map, reduce, parallel_for not functions? Does D offer advantages if these are methods?

Edit: I have played around with your library, and it is nice. It also scales well (relative to hand-coded threading) for cases with mostly calculations and low memory usage. I just reiterate the two suggestions I have already made:

I would consider separating algorithms (data parallelism) and task groups (task parallelism). This would bring it closer to more common C++ libraries (TBB, OpenMP, MS PPL and TPL). Also from an implementation perspective: you might want to schedule data parallism without task groups in the future (e.g. GPU bound) or use additional information (e.g. memory layout).
This already implies that the scheduler could be made independent from the TaskPool. Furthermore, I would also consider making the scheduler a singleton. To quote Intel's TBB FAQ on why the scheduler is a singleton:

[...] some libraries control program-wide resources, such as memory and processors. For example, garbage collectors control memory allocation across a program. Analogously, TBB controls scheduling of tasks across a program. To do their job effectively, each of these must be a singleton; [...] Allowing k instances of the TBB scheduler in a single program would cause there to be k times as many software threads as hardware threads. The program would operate inefficiently, because the machine would be oversubscribed by a factor of k, causing more context switching, cache contention, and memory consumption.

reduce, map and parallel foreach are methods because they use the pool rather than starting new threads. Calling these on an instance of TaskPool is how you specify what pool you want the job submitted to.

dsimcha 2010-02-20 19:21:59

@dsimcha: interesting. The standard C++ libraries (OpenMP, TBB) separate algorithms, tasks (and task groups), and the task scheduler. Your comment seems to indicate that the TaskPool is also the scheduler? Is the scheduler not a singleton? I guess this is the code: http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d. Will take a look.

stephan 2010-02-20 22:31:13

The scheduler is not a singleton, and TaskPool is also the scheduler. I've actually had practical use cases for having more than one of them. One example is when all of the threads in one of my pools were blocking waiting for an object to be constructed, so I fired up a second pool to parallelize the construction of that object.

dsimcha 2010-02-20 23:22:09

@dsimcha: I have added some comments to the main body. Regarding your use case: why not simply start a thread manually for these (hopefully) rare cases? BTW, D2.0 looks really good, thanks.

stephan 2010-02-21 22:20:01

ansaurus

tags:

views:

answers:

API Design for Task/Parallelization Library

related questions