ansaurus

Question

Multiple levels of parallelism using OpenMP - Possible? Smart? Practical?

Answer 1

A:

Why not ask the experts over at OpenMP.org

Register and log in at: http://openmp.org/forum/viewforum.php?f=3

rchrd 2010-07-01 19:41:12

Thanks rchrd- I actually didn't know openMP had a forum. I'll start perusing over there, to see what I can learn.

MarkD 2010-07-01 20:24:04

Answer 2

+3 A:

Please note that everything I describe is implementation dependent.

Is it possible to limit the number of threads in an openmp for loop?

Yes. There are different ways to do so. Set omp_set_nested(1); and use something like #pragma omp parallel for num_threads(4) or similar in your outer loop and #pragma omp parallel for num_threads(2) directive in your inner loop. This should give you 8 threads (depending on the implementation, you possibly also have to set OMP_THREAD_LIMIT if you have less than 8 cores)

Alternatively, you can manually unroll your loops, e.g. using something like

#pragma omp parallel sections {
     #pragma omp section 
     do your stuff for the first part, nest parallel region again
     #pragma omp section 
     and so on for the other parts
}

You can do the same thing sometimes more efficiently in OpenMP 3.0 with #pragma omp task.

Or you start 8 threads and get the current thread number within the parallel section and schedule manually based on the thread number.

Finally, if you have a perfectly nested loop (a loop is perfectly nested, if the actual assignment only happens in the innermost loop), you can rewrite everything into a single loop. Basically pack your two iterators i and j into one big iterator (i, j). Note that this might reduce locality and therefore decrease performance

Is this multi-level parallelism something that in practice makes sense?

It depends, and you have to find out yourself. In general, multi-level parallelism makes your problem more scalable. Scheduling, however, can be more complicated. This paper might be interesting.

Regarding setting the number of threads manually: the main advantage of setting the numbers of threads is that you can use specific knowledge about your problem when scheduling. Thereby, you can reduce overhead and get higher locality of the code being executed and hence more cache hits and less main memory I/O.

The main disadvantage of manually setting the number of threads in nested parallelism is that threads in the innermost loop might wait idly at the implicit barrier, while additional work could be done (example). Also, coarse-grained parallelism doesn't scale well. So if your outer loop has very different run-time within the loop, you want to to schedule more flexibly than simply splitting into 4 threads.

Any other thoughts

Have you though about doing the MxV with SIMD. Depending on the architecture, this can give a speed-up of 2-4. I quickly googled this presentation for you.

For MxV, loop tiling, register and cache blocking and related techniques can increase data locality and reduce other issues, e.g. false sharing. This book, chapter 11 (you can preview it), might give you some additional ideas on how to restructure data access.

stephan 2010-07-01 20:06:40

stephan- thank you very much for this very informative post, and for the links- I'll definitely be reading up on the scheduling. A lot here to digest.As for using SIMD for the MxV, I have implemented an SSE MxV routine for the dense matrix multiplication, and it does have a very nice speed-up. Unfortunately, from what I've read, the memory access pattern for a sparse CRS or CDS MxV operation typically prevents vectorization. I have read a couple of papers that show methods by which to gain some from SIMD for sparse MxV, but haven't had the time to really dive into it.

MarkD 2010-07-01 20:34:57

@MarkD: glad it helps. I don't know much about SIMD with sparse data, I must admit. I added a link on cache blocking that might be helpful, though.

stephan 2010-07-02 04:12:55

ansaurus

tags:

views:

answers:

Multiple levels of parallelism using OpenMP - Possible? Smart? Practical?

related questions