Hello,
I am runig a program on a server at my university that has 4 Dual-Core AMD Opteron(tm) Processor 2210 HE and the O.S. is Linux version 2.6.27.25-78.2.56.fc9.x86_64. My program implements Conways Game of Life and it runs using pthreads and openmp. I timed the parrallel part of the program using the getimeofday() function using 1-8 threads. But the timings don't seem right. I get the biggest time using 1 thread(as expected), then the time gets smaller. But the smallest time i get is when i use 4 threads.
Here is an example when i use an array 1000x1000.
Using 1 thread~9,62 sec, Using 2 Threads~4,73 sec, Using 3 ~ 3.64 sec, Using 4~2.99 sec, Using 5 ~4,19 sec, Using 6~3.84, Using 7~3.34, Using 8~3.12.
The above timings are when i use pthreads. When i use openmp the timing are smaller but follow the same pattern.
I expected that the time would decrease from 1-8 because of the 4 Dual core cpus? I thought that because there are 4 cpus with 2 cores each, 8 threads could run at the same time. Does it have to do with the operating system that the server runs?
Also i tested the same programs on another server that has 7 Dual-Core AMD Opteron(tm) Processor 8214 and runs Linux version 2.6.18-194.3.1.el5. There the timings i get are what i expected. The timings get smaller starting from 1(the biggest) to 8(smallest excecution time).
The program implements the Game of Life correct, both using pthreads and openmp, i just cant figure out why the timings are like the example i posted. So in conclusion, my questions are:
1) The number of threads that can run at the same time on a system depends by the cores of the cpus?it depends only by the cpus although each cpu has more than one cores? It depends by all the previous and the Operating System?
2) Does it have to do with the way i divide the 1000x1000 array to the number of threads? But if i did then the openmp code wouldn't give the same pattern of timings?
3)What is the reason i might get such timmings?
excuse my english i am from europe... thnx in advanse.
EDIT: This is the code i use with openmp:
#define Row 1000+2
#define Col 1000+2 int num; int (*temp)[Col]; int (*a1)[Col]; int (*a2)[Col];
int main() {
int i,j,l,sum;
int array1[Row][Col],array2[Row][Col];
struct timeval tim;
struct tm *tm;
double start,end;
int st,en;
for (i=0; i<Row; i++)
for (j=0; j<Col; j++)
{
array1[i][j]=0;
array2[i][j]=0;
}
array1[3][16]=1;
array1[4][16]=1;
array1[5][15]=1;
array1[6][15]=1;
array1[6][16]=1;
array1[7][16]=1;
array1[5][14]=1;
array1[4][15]=1;
a1=array1;
a2=array2;
printf ("\nGive number of threads:");
scanf("%d",&num);
gettimeofday(&tim,NULL);
start=tim.tv_sec+(tim.tv_usec/1000000.0); omp_set_num_threads(num);
#pragma omp parallel private(l,i,j,sum)
{
printf("Number of Threads:%d\n",omp_get_num_threads());
for (l=0; l<100; l++)
{
#pragma omp for
for (i=1; i<(Row-1); i++)
{
for (j=1; j<(Col-1); j++)
{
sum=a1[i-1][j-1]+a1[i-1][j]+a1[i-1][j+1]+a1[i][j-1]+a1[i][j+1]+a1[i+1][j-1]+a1[i+1][j]+a1[i+1][j+1];
if ((a1[i][j]==1) && (sum==2||sum==3))
a2[i][j]=1;
else if ((a1[i][j]==1) && (sum<2))
a2[i][j]=0;
else if ((a1[i][j]==1) && (sum>3))
a2[i][j]=0;
else if ((a1[i][j]==0 )&& (sum==3))
a2[i][j]=1;
else if (a1[i][j]==0)
a2[i][j]=0;
}//end of iteration J
}//end of iteration I
#pragma omp barrier
#pragma omp single
{
temp=a1;
a1=a2;
a2=temp;
}
#pragma omp barrier
}//end of iteration L
}//end of paraller region
gettimeofday(&tim,NULL);
end=tim.tv_sec+(tim.tv_usec/1000000.0);
printf("\nTime Elapsed:%.6lf\n",end-start);
printf("all ok\n");
return 0; }
TIMINGS with openmp code
a)System with 7 Dual Core Cpus Using 1 thread~7,72 sec, Using 2 threads~4,53 sec, Using 3 Threads~3,64 sec, Using 4 threads~ 2,24 sec, Using 5~2,02 sec, Using 6~ 1,78 sec, Using 7 ~1,59 sec,Using 8 ~ 1,44 sec
b)System with 4 Dual Core Cpus Using 1 thread~9,06 sec, Using 2 threads~4,86 sec, Using 3 Threads~3,49 sec, Using 4 threads~ 2,61 sec, Using 5~3,98 sec, Using 6~ 3,53 sec, Using 7 ~3,48 sec,Using 8 ~ 3,32 sec
Above are the timings i get.