tags:

views:

68

answers:

4

I'm learning OpenMP. To do so, I'm trying to make an existing code parallel. But I seems to get an worse time when using OpenMP than when I don't.

My inner loop:

    #pragma omp parallel for
    for(unsigned long j = 0; j < c_numberOfElements; ++j)
    {
        //int th_id = omp_get_thread_num();
        //printf("thread %d, j = %d\n", th_id, (int)j);

        Point3D current;
        #pragma omp critical
        {
            current = _points[j];
        }

        Point3D next = getNext(current);

        if (!hasConstraint(next))
        {
            continue;
        }

        #pragma omp critical
        {
            _points[j] = next;
        }
    }

_points is a pointMap_t, defined as:

typedef boost::unordered_map<unsigned long, Point3D> pointMap_t;

Without OpenMP my running time is 44.904s. With OpenMP enabled, on a computer with two cores, it is 64.224s. What I am doing wrong?

+1  A: 

Why have you wrapped your reads and writes to _points[j] in critical sections ? I'm not much of a C++ programmer, but it doesn't look to me as if you need those sections at all. As you've written it (uunamed critical sections) each thread is going to wait while the other goes through each of the sections. This could easily make the program slower.

High Performance Mark
It segfaults without it. I don't think I can alter the map from two different threads concurrently.
Vitor Py
A: 

It seems possible that the lookup and write to _points in critical sections is dragging down the performance when you use OpenMP. Single-threaded, this will not result in any contention.

Sharing seed data like this seems counterproductive in a parallel programming context. Can you restructure to avoid these contention points?

Steve Townsend
There is another way to do it? I'm not used to parallel programming.
Vitor Py
@Vitor - I cannot see why it would fault if you just remove the critical sections. Something in the code we cannot see maybe? `_points` is not being added to or removed from while the loop executes, right? Do the function calls have side effects outside the current entry? Where is the exception?
Steve Townsend
Oli Charlesworth
@Steve Townsend From gdb: [New Thread 0x7ffff0ccd710 (LWP 13329)]*** glibc detected *** /home/vitorpy/openmp/test: double free or corruption (fasttop): 0x00000000027f8be0 ***. bt shows it's occurs on the unordered_map operator[]. Function calls have no side effects.
Vitor Py
@Oli Charlesworth I'll take a look at it. Thank you.
Vitor Py
@Vitor - unexpected side-effect of unordered_map::operator[]? It may be shuffling stuff in the underlying struct here. Can you use a `vector<Point3D>`, since you are just looping from 0..size-1 ?
Steve Townsend
@Steve Townsend I'll try it.
Vitor Py
@Steve Townsend It still segfaults at the _points[j] = next; line.
Vitor Py
@Vitor - I don't get it. I don't have OpenMP at home so cannot even try if you had full code.
Steve Townsend
A: 

You need to show the rest of the code. From a comment to another answer, it seems you are using a map. That is really a bad idea, especially if you are mapping 0..n numbers to values: why don't you use an array?

If you really need to use containers, consider using the ones from the Intel's Thread Building Blocks library.

florin
@florin You're right. The map is mostly a leftover from an older design where ID were not always sequential. I've made it into an array. It's still slow but I think I'm probably bumping into the false sharing issue or something.
Vitor Py
@Vitto Py: please try to create a small compilable code sample that shows the problem. You have simplified the code so much that you have abstracted the problem away. One thing to try would be to make "hasContraint" as static inline, to avoid paying the penalty of a function call.
florin
A: 

I agree that it would be best to see some working code.

The ultimate issue here is that there are criticals within a parallel region, and criticals are (a) enormously expensive in and of themselves, and (b) by definition, kill parallelism. The assignment to current certainl doesn't need to be inside a critical, as it is private; I wouldn't have thought the _points[j] assignment would be, either, but I don't know what the map stuff does, so there you go.

But you have a loop in which you have a huge amount of overhead, which grows linearly in the number of threads (the two critical regions) in order to do a tiny amount of actual work (walk along a linked list, it looks like). That's never going to be a good trade...

Jonathan Dursi