Hi,
As a starting point I'd suggest OpenMP. With this you can very simply do three basic types of parallelisation: loops, sections, and tasks.
Parallel loops allow you to split loop iterations over multiple threads. So using two threads the first thread would perform the first half of the iteration, the second thread would perform the section half.
#pragma omp parallel for
for (int i=0; i<N; i++) {...}
Sections allow you to statically partition the work over multiple threads. This is useful when there is obvious work that can be performed in parallel. However, it's not a very flexible approach.
#pragma omp parallel sections
{
#pragma omp section
{...}
#pragma omp section
{...}
}
Tasks are the more flexible approach - these are created dynamically and their execution is performed asynchronously, either by the thread that created them, or by another thread.
#pragma omp task
{...}
OpenMP has several things going for it.
Directive based, which means that the compiler does the work of creating and synchronising the threads.
Incremental parallelism, meaning that you can focus on just the region of code that you need to parallelise.
One source base for serial and parallel code. The OpenMP directives are only recognised by the compiler under compiler flag. So you can use the same source base to generate serial and parallel code. This means that if the parallel code generates a wrong answer you can use the same code base to generate a serial version which you can then use to verify the computation. This means that you can isolate parallelisation errors from errors in the algorithm.
You can find the entire OpenMP spec at http://www.openmp.org/
Regards,
Darryl.