ansaurus

Question

Problem with the timmings of a program that uses 1-8 threads on a server that has 4 Dual Core Cpu's?

Answer 1

A:

One thing you have to remember is that you're doing this on a shared memory architecture. The more loads/stores you are trying to do in parallel, the more chance you're going to have to hit contention with regards to memory access, which is a relatively slow operation. So in typical applications in my experience, don't benefit from more than 6 cores. (This is anecdotal, I could go into a lot of detail, but I don't feel like typing. Suffice to say, take these numbers with a grain of salt).

Try instead to minimize access to shared resources if possible, see what that does to your performance. Otherwise, optimize for what you got, and remember this:

Throwing more cores at a problem does not mean it will go quicker. Like with taxation, there's a curve as to when the number of cores, starts becoming a detriment to collecting the most performance out of your program. Find that "sweet spot", and use it.

jer 2010-07-25 17:15:22

Hello and thanx for the answer. Yes i understand that there are data dependencies but(solving them is beyond the goal of my program) the code being as it is, when i test it on the system with 4 Dual Core cpus(so 8 cores) and the system that has 7 Dual Core Cpus the data dependencies and cache thrashing and update policy are the same, the cache size of each hasnt changed. But with the system that has 7 Dual Core Cpus i get the timings i expected but with the other system that has 8 cores(and the max number of threads i use is still 8) the smallest execution is when i use 4 threads.

stois21 2010-07-26 16:16:33

Answer 2

A:

You write

The above timings are when i use pthreads. When i use openmp the timing are smaller but follow the same pattern.

Congratulations, you have discovered the pattern which all parallel programs follow ! If you plot execution time against number of processors the curve eventually flattens out and starts to rise; you reach a point where adding more processors slows things down.

The interesting question is how many processors you can profitably use and the answer to this is dependent on many factors. @jer has pointed out some of the factors which affect the scalability of programs on shared-memory computers. Other factors, principally the ratio of communication to computation, ensure that the shape of the performance curve will be the same on distributed-memory computers too.

The other factor which is important when measuring the parallel scalability of your program is the problem size(s) you use. How does your performance curve change when you try a grid of 1414 x 1414 cells ? I would expect that the curve will be below the curve for the problem on 1000 x 1000 cells and will flatten out later.

For further reading Google for Amdahl's Law and Gustafson's Law.

High Performance Mark 2010-07-26 10:26:27

Hello and thanx for the answer, so even though the one system has 8 cores and the max number of threads i use is 8, its logical that the smallest execution time i get is only with 4 threads?And when i test my code on the system that has 14 cores as you can see in my edited post the timing are what would someone expect from a 14 core system when the max number of threads you use are 8.

stois21 2010-07-26 16:27:18

Yes, none of what you report is a surprise. I'd be disappointed if I created a program which minimised execution time on only 4 out of 8 cores, but not surprised. Did you try running larger problems ?

High Performance Mark 2010-07-26 16:30:43

The biggest size of array i can try is 1000x1000 when i tried anything a lot bigger it didnt compile. So ok for the system that has 8 cores due to scheduling, memory transfers and the update policy of the system architecture the smallest time is for 4 threads. But then just by incresing the cpu cores i get the timings i expected? Isnt it a fact that just by increasing the hardware you dont get an improvment?

stois21 2010-07-26 17:41:03

ansaurus

tags:

views:

answers:

Problem with the timmings of a program that uses 1-8 threads on a server that has 4 Dual Core Cpu's?

related questions