Hi,
I have the following piece of code, which I want to make parallel in a certain way. I am making a mistake, and hence not all threads are running the loop as I thought it should. It would be great if somebody could help me out identifying that mistake.
This is a code to calculate histograms.
#pragma omp parallel default(shared) private(iIndex2, iIndex1, fDist) shared(iSize, dense) reduction(+:iCount)
{
chunk = (unsigned int)(iSize / omp_get_num_threads());
threadID = omp_get_thread_num();
svtout << "Number of threads available " << omp_get_num_threads() << endl;
svtout << "The threadID is " << threadID << endl;
//want each of the thread to execute the loop
for (iIndex1=0; iIndex1 < chunk; iIndex1++)
{
for (iIndex2=iIndex1+1; iIndex2 < chunk; iIndex2++)
{
iCount++;
fDist = (*this)[iIndex1 + threadID*chunk].distance( (*this)[iIndex2 + threadID*chunk] );
idx = (int)(fDist/fWidth);
if ((int)fDist % (int)fWidth >= 0)
{
#pragma omp atomic
dense[idx] += 1;
}
}
}
The iCount variable keeps track of the number of iterations, and I noticed that there is a marked difference between the serial and the parallel version. I guess not all threads are running, and hence the histogram values that I'm obtaining from the parallel program are much less than the actual readings (the dense array stores the histogram values).
Thanks,
Sayan