views:

44

answers:

3

Hi,

I have a program block like:

    for (iIndex1=0; iIndex1 < iSize; iIndex1++)
    {
        for (iIndex2=iIndex1+1; iIndex2 < iSize; iIndex2++)
        {   
            iCount++;
            fDist =(*this)[iIndex1].distance( (*this)[iIndex2] );
            m_oPDF.addPairDistance( fDist );

            if ((bShowProgress) && (iCount % 1000000 == 0))
                xyz_exception::ui()->progress( iCount, (size()-1)*((size()-1))/2 );

        }
    }
} 
}

I have tried parallelising the inner and outer loop and by putting iCount in a critical region. What would be the best approach to parallelise this? If I wrap iCount with omp single or omp atomic then the code gives an error and I figured out that would be invalid inside omp for. I guess I am adding many extraneous stuffs to paralellise this. Need some advice...

Thanks,

Sayan

+1  A: 

If I interpret your intentions correctly you want to use iCount to tell your program when (every 10^6 operations) to update a UI ? And iCount is global, all the threads are to share the value and you want to maintain its consistency ?

I would search for a way to replace this global counter with counters private to each thread and have the threads send a message to update the UI independently of each other. If you insist on using a global counter, you are going to have to, somehow, synchronise across threads, which will be a performance hit. Yes, you could write your program that way but I don't recommend it.

If you don't like the idea of all the threads sending messages to the UI perhaps just one thread could do that; if one thread is 1/4 of the way through the program, so are the other threads (approximately).

High Performance Mark
A: 

Thanks again Mark. I tried the approaches that you have suggested. I have put reduction(+:iCount) and also tried wrapping iCount++ around pragma critical, and yes it is a performance hit (also I could see no speedup). Also, I have let one thread handle iCount, but the approaches I made results in no speedup.

I expected that if I put a pragma for around the inner loop, and declare iCount as a reduction variable, I would notice at least some speedup. My aim is the parallel execution of these statements for an Index1, Index2 pair:

        fDist =(*this)[iIndex1].distance( (*this)[iIndex2] );
        m_oPDF.addPairDistance( fDist );

which could noticeably impact the program run time.

Sayan Ghosh
What I suggest you do is forget about iCount for a while, parallelise your outermost loop and get some speedup. Once you've done that you can experiment with ways of implementing your counter and examining their effect on speedup. Right now I think you are trying to take giant steps when your level of experience with OpenMP suggests you should be taking small steps.
High Performance Mark
A: 

Many thanks Mark. I removed iCount and made the outer loop parallel, but I am digging the code since I am observing no speedup still when compared to the serial version.

I would like to take this opportunity to get a basic fact clarified...in a nested loop environment like the above...which one could be generally better:

  1. Making the inner loop parallel

    pragma omp parallel
    for(...i...)
    pragma omp for
    for(...j...)

  2. Making the outer loop parallel, (just a ...pragma parallel for... before the outer loop)

  3. Using Collapse (for Omp 3.0)

Thanks
Sayan

Sayan Ghosh