ansaurus

Question

Synchronisation construct inside pragma for

Answer 1

+1 A:

If I interpret your intentions correctly you want to use iCount to tell your program when (every 10^6 operations) to update a UI ? And iCount is global, all the threads are to share the value and you want to maintain its consistency ?

I would search for a way to replace this global counter with counters private to each thread and have the threads send a message to update the UI independently of each other. If you insist on using a global counter, you are going to have to, somehow, synchronise across threads, which will be a performance hit. Yes, you could write your program that way but I don't recommend it.

If you don't like the idea of all the threads sending messages to the UI perhaps just one thread could do that; if one thread is 1/4 of the way through the program, so are the other threads (approximately).

High Performance Mark 2010-05-25 14:19:45

Answer 2

A:

Thanks again Mark. I tried the approaches that you have suggested. I have put reduction(+:iCount) and also tried wrapping iCount++ around pragma critical, and yes it is a performance hit (also I could see no speedup). Also, I have let one thread handle iCount, but the approaches I made results in no speedup.

I expected that if I put a pragma for around the inner loop, and declare iCount as a reduction variable, I would notice at least some speedup. My aim is the parallel execution of these statements for an Index1, Index2 pair:

        fDist =(*this)[iIndex1].distance( (*this)[iIndex2] );
        m_oPDF.addPairDistance( fDist );

which could noticeably impact the program run time.

Sayan Ghosh 2010-05-25 20:43:16

What I suggest you do is forget about iCount for a while, parallelise your outermost loop and get some speedup. Once you've done that you can experiment with ways of implementing your counter and examining their effect on speedup. Right now I think you are trying to take giant steps when your level of experience with OpenMP suggests you should be taking small steps.

High Performance Mark 2010-05-26 08:48:15

Answer 3

A:

Many thanks Mark. I removed iCount and made the outer loop parallel, but I am digging the code since I am observing no speedup still when compared to the serial version.

I would like to take this opportunity to get a basic fact clarified...in a nested loop environment like the above...which one could be generally better:

Making the inner loop parallel

pragma omp parallel
for(...i...)
pragma omp for
for(...j...)
Making the outer loop parallel, (just a ...pragma parallel for... before the outer loop)
Using Collapse (for Omp 3.0)

Thanks
Sayan

Sayan Ghosh 2010-05-31 17:31:41

ansaurus

tags:

views:

answers:

Synchronisation construct inside pragma for

related questions