Java has a parallel task framework similar to Thread Building Blocks - it's called the Fork-Join framework. It's available for use with the current Java SE 6 and to be included in the upcoming Java SE 7.
There are resources available for getting started with the framework, in addition to the javadoc class documentation. From the jsr166 page, mentions that
"There is also a wiki containing additional documentation, notes, advice, examples, and so on for these classes."
The fork-join examples, such as matrix multiplication are a good place to start.
I used the fork-join framework in solving some of Intel's 2009 threading challenges. The framework is lightweight and low-overhead - mine was the only Java entry for the Kight's Tour problem and it outperformed other entries in the competition. The java sources and writeup are available from the challenge site for download.
EDIT:
I have no idea how a class or pieces
of code may look after pushing it onto
a pool [...]
You can make your own task by subclassing one of the ForKJoinTask subclasses, such as RecursiveTask. Here's how to compute the fibonacci sequence in parallel. (Taken from the RecursiveTask
javadocs - comments are mine.)
// declare a new task, that itself spawns subtasks.
// The task returns an Integer result.
class Fibonacci extends RecursiveTask<Integer> {
final int n; // the n'th number in the fibonacci sequence to compute
Fibonnaci(int n) { this.n = n; } // constructor
Integer compute() { // this method is the main work of the task
if (n <= 1) // 1 or 0, base case to end recursion
return n;
Fibonacci f1 = new Fibonacci(n - 1); // create a new task to compute n-1
f1.fork(); // schedule to run asynchronously
Fibonacci f2 = new Fibonacci(n - 2); // create a new task to compute n-2
return f2.invoke() + f1.join(); // wait for both tasks to compute.
// f2 is run as part of this task, f1 runs asynchronously.
// (you could create two separate tasks and wait for them both, but running
// f2 as part of this task is a little more efficient.
}
}
You then run this task and get the result
// default parallelism is number of cores
ForkJoinPool pool = new ForkJoinPool();
Fibonacci f = new Fibonacci(100);
int result = pool.invoke(f);
This is a trivial example to keep things simple. In practice, performance would not be so good, since the work executed by the task is trivial compared to the overhead of the task framework. As a rule of thumb, a task should perform some significant computation - enough to make the framework overhead insignificant, yet not so much that you end up with one core at the end of the problem running one large task. Splitting large tasks into smaller ones ensures that one core isn't left doing lots of work while other cores are idle - using smaller tasks keeps more cores busy, but not so small that the task does no real work.
[...] or how weird code may look when
you need to make a copy of everything
and how much of everything is pushed
onto a pool.
Only the tasks themselves are pushed into a pool. Ideally you don't want to be copying anything: to avoid interference and the need for locking, which would slow down your program, your tasks should ideally be working with independent data. Read-only data can be shared amongst all tasks, and doesn't need to be copied. If threads need to co-operate building some large data structure, it's best they build the pieces separately and then combine them at the end. The combining can be done as a separate task, or each task can add it's piece of the puzzle to the overall solution. This often does require some form of locking, but it's not a considerable performance issue if the work of the task is much greater than the the work updating the solution. My Knight's Tour solution takes this approach to update a common repository of tours on the board.
Working with tasks and concurrency is quite a paradigm shift from regular single-threaded programming. There are often several designs possible to solve a given problem, but only some of these will be suitable for a threaded solution. It can take a few attempts to get the feel for how to recast familiar problems in a multi-threaded way. The best way to learn is to look at the examples, and then try it for yourself. Always profile, and meausre the effects of varying the number of threads. You can explicitly set the number of threads (cores) to use in the pool in the pool constructor. When tasks are broken up linearly, you can expect near linear speedup as the number of threads increases.