ansaurus

Question

Concurrency and optimization using OpenMP

Answer 1

+1 A:

Why have you wrapped your reads and writes to _points[j] in critical sections ? I'm not much of a C++ programmer, but it doesn't look to me as if you need those sections at all. As you've written it (uunamed critical sections) each thread is going to wait while the other goes through each of the sections. This could easily make the program slower.

High Performance Mark 2010-10-01 17:52:16

It segfaults without it. I don't think I can alter the map from two different threads concurrently.

Vitor Py 2010-10-01 18:06:22

Answer 2

A:

It seems possible that the lookup and write to _points in critical sections is dragging down the performance when you use OpenMP. Single-threaded, this will not result in any contention.

Sharing seed data like this seems counterproductive in a parallel programming context. Can you restructure to avoid these contention points?

Steve Townsend 2010-10-01 17:53:55

There is another way to do it? I'm not used to parallel programming.

Vitor Py 2010-10-01 18:07:03

@Vitor - I cannot see why it would fault if you just remove the critical sections. Something in the code we cannot see maybe? `_points` is not being added to or removed from while the loop executes, right? Do the function calls have side effects outside the current entry? Where is the exception?

Steve Townsend 2010-10-01 18:41:02

Oli Charlesworth 2010-10-01 18:50:58

@Steve Townsend From gdb: [New Thread 0x7ffff0ccd710 (LWP 13329)]*** glibc detected *** /home/vitorpy/openmp/test: double free or corruption (fasttop): 0x00000000027f8be0 ***. bt shows it's occurs on the unordered_map operator[]. Function calls have no side effects.

Vitor Py 2010-10-01 19:14:05

@Oli Charlesworth I'll take a look at it. Thank you.

Vitor Py 2010-10-01 19:14:34

@Vitor - unexpected side-effect of unordered_map::operator[]? It may be shuffling stuff in the underlying struct here. Can you use a `vector<Point3D>`, since you are just looping from 0..size-1 ?

Steve Townsend 2010-10-01 19:17:09

@Steve Townsend I'll try it.

Vitor Py 2010-10-01 19:33:34

@Steve Townsend It still segfaults at the _points[j] = next; line.

Vitor Py 2010-10-01 19:37:47

@Vitor - I don't get it. I don't have OpenMP at home so cannot even try if you had full code.

Steve Townsend 2010-10-01 19:51:10

Answer 3

A:

You need to show the rest of the code. From a comment to another answer, it seems you are using a map. That is really a bad idea, especially if you are mapping 0..n numbers to values: why don't you use an array?

If you really need to use containers, consider using the ones from the Intel's Thread Building Blocks library.

florin 2010-10-01 19:48:54

@florin You're right. The map is mostly a leftover from an older design where ID were not always sequential. I've made it into an array. It's still slow but I think I'm probably bumping into the false sharing issue or something.

Vitor Py 2010-10-01 19:57:03

@Vitto Py: please try to create a small compilable code sample that shows the problem. You have simplified the code so much that you have abstracted the problem away. One thing to try would be to make "hasContraint" as static inline, to avoid paying the penalty of a function call.

florin 2010-10-01 21:22:12

Answer 4

A:

I agree that it would be best to see some working code.

The ultimate issue here is that there are criticals within a parallel region, and criticals are (a) enormously expensive in and of themselves, and (b) by definition, kill parallelism. The assignment to current certainl doesn't need to be inside a critical, as it is private; I wouldn't have thought the _points[j] assignment would be, either, but I don't know what the map stuff does, so there you go.

But you have a loop in which you have a huge amount of overhead, which grows linearly in the number of threads (the two critical regions) in order to do a tiny amount of actual work (walk along a linked list, it looks like). That's never going to be a good trade...

Jonathan Dursi 2010-10-01 21:56:24

ansaurus

tags:

views:

answers:

Concurrency and optimization using OpenMP

related questions