tags:

views:

59

answers:

2

Hi All,

I have a strange problem but may not be that much strange to some of you.

I am writing an application using boost threads and using boost barriers to synchronize the threads. I have two machines to test the application.

Machine 1 is a core2 duo (T8300) cpu machine (windows XP professional - 4GB RAM) where I am getting following performance figures :

Number of threads :1 , TPS :21

Number of threads :2 , TPS :35 (66 % improvement)

further increase in number of threads decreases the TPS but that is understandable as the machine has only two cores.

Machine 2 is a 2 quad core ( Xeon X5355) cpu machine (windows 2003 server with 4GB RAM) and has 8 effective cores.

Number of threads :1 , TPS :21

Number of threads :2 , TPS :27 (28 % improvement)

Number of threads :4 , TPS :25

Number of threads :8 , TPS :24

As you can see, performance is degrading after 2 threads (though it has 8 cores). If the program has some bottle neck , then for 2 thread also it should have degraded.

Any idea? , Explanations ? , Does the OS has some role in performance ? - It seems like the Core2duo (2.4GHz) scales better than Xeon X5355 (2.66GHz) though it has better clock speed.

Thank you

-Zoolii

A: 

Adding more CPU's does not always equate to better performance, locking and contention can severely degrade performance. Factors to consider are:

  • Is your algorithm suited to parallelisation?
  • Any inherently sequential portions of code?
  • Can you partition work into coarse grained 'chunks'? Corase is usually better than fine grained...
  • Can you alter your code to use less locking?
  • Synchronisation overheads can often be reduced by ensuring chunks of work are similiar sized.
Mitch Wheat
Thank you for the answer.Each thread has its own chunk(similar sized) of work without any locks. Actually there are two questions.1. Why after two threads perormance degrades- Does the processor architecture/OS has something to do with it. In other words why it scaled up for 2 threads ? . If program has a inherent bottle neck, it should not have scaled up for 2 threads also.2.Why core2duo is scaling better than xeon X5355. Is the former a better architecture ? Question is more towards hardware. I will post this in serverfault. Thanks
zoolii
+2  A: 

The clock speed and the operating system doesn't have as much to do with it as the way your code is written. Things to check might include:

  • Are you actually spinning up more than two threads at one time?
  • Do you have unnecessary synchronization artifacts in your code?
  • Are you synchronizing your code at the appropriate places?
  • What is your shareable resource and how many of then are there? If each of your transactions is relying on a single section of code, native library, file, database, whatever, then it doesn't matter how many CPUs you've got.

One tool at your disposal when analyzing software bottlenecks is the simple thread dump. Taking a few dumps throughout the life of an execution of your software should expose bottlenecks in your software. You may be able to take that output and use it to reevaluate your code.

dbrown0708