views:

292

answers:

2

Hi,

I have a program that starts up and creates an in-memory data model and then creates a (command-line-specified) number of threads to run several string checking algorithms against an input set and that data model. The work is divided amongst the threads along the input set of strings, and then each thread iterates the same in-memory data model instance (which is never updated again, so there are no synchronization issues).

I'm running this on a Windows 2003 64-bit server with 2 quadcore processors, and from looking at Windows task Manager they aren't being maxed-out, (nor are they looking like they are being particularly taxed) when I run with 10 threads. Is this normal behaviour?

It appears that 7 threads all complete a similar amount of work in a similar amount of time, so would you recommend running with 7 threads instead?

Should I run it with more threads?...Although I assume this could be detrimental as the JVM will do more context switching between the threads.

Alternatively, should I run it with fewer threads?

Alternatively, what would be the best tool I could use to measure this?...Would a profiling tool help me out here - indeed, is one of the several profilers better at detecting bottlenecks (assuming I have one here) than the rest?

Note, the server is also running SQL Server 2005 (this may or may not be relevant), but nothing much is happening on that database when I am running my program.

Note also, the threads are only doing string matching, they aren't doing any I/O or database work or anything else they may need to wait on.

Thanks in advance,

-James

+2  A: 

Without seeing the actual code, it's hard to give proper advice. But do make sure that the threads aren't locking on shared resources, since that would naturally prevent them all from working as efficiently as possible. Also, when you say they aren't doing any io, are they not reading an input or writing an output either? this could also be a bottleneck.

With regards to cpu intensive threads, it is normally not beneficial to run more threads than you have actual cores, but in an uncontrolled environment like this with other big apps running at the same time, you are probably better off simply testing your way to the optimal number of threads.

kasperjj
@kasperjj - Fair enough, the code is a bit difficult to pull apart to show a succinct example. The threads do eventually write to an individual output to a file (one file per thread), but this is pretty minimal IO. Is there a mechanism a thread can lock on a shared resource without any explicit synchronization I have specified?
James B
I was specifically thinking of something like a shared result file, but obviously you have that covered :-)You might also want to ensure that you are not using any synchronized data structures such as Vector.
kasperjj
@kasperjj - thanks, I'm using arrays, Sets and ArrayLists where I can
James B
@kasperjj - it turns out you were right, I am using a library which (rather childishly) prints a warning message to System.out on construction of a commonly-used object, this then causes contention for the System.out PrintStream. I moved the message to a static block, and BUDDABING! My processors are near maxed-out, I have a 10-fold performance increase, world peace almost broke out and I was elected chairman of my own fanclub!!
James B
Ha ha.... awesome! good to hear :-)
kasperjj
+5  A: 

My guess would be that your app is bottlenecked on memory access, i.e. your CPU cores spend most of the time waiting for data to be read from main memory. I'm not sure how well profilers can diagnose this kind of problem (the profiler itself could influence the behaviour considerably). You could verify the guess by having your code repeat the operations it does many times on a very small data set.

If this guess is correct, the only thing you can do (other than getting a server with more memory bandwidth) is to try and increase the locality of your memory access to make better use of caches; but depending on the details of the application that may not be possible. Using more threads may in fact lead to worse performance because of cores sharing cache memory.

Michael Borgwardt
As ever Mr B, you have given me something to think about and digest for a bit, thanks!
James B
@Michael If this is the case, the only point at which threads may share memory is with the shared dataset. If I cloned the Strings when they are given in an array list to each thread (or even did `new String(inputString)` to ensure they are not the same reference in the constant pool), could that eliminate such a bottleneck?
James B
@James B: No, that would make the problem worse by having the threads push each others's copy of the dataset out of the cache. As long as there is no need for synchronization, a shared dataset is *good*.
Michael Borgwardt