views:

117

answers:

5

Given a machine with 1 CPU and a lot of RAM. Besides other kinds of applications (web server etc.), there are 2 other server applications running on that machine doing the exact same kind of processing although one uses 10 threads and the other users 1 thread. Assume the processing logic for each request is 100% CPU-bound and typically takes no longer than 2 seconds to finish. The question is whose throughput, in terms of transactions processed per minute, might be better? Why?

Note that the above is not a real environment, I just make up the data to make the question clear. My current thinking is that there should be no difference because the apps are 100% CPU-bound and therefore if the machine can handle 30 requests per minute for the 2nd app, it will also be able to handle 3 requests per minute for each of the 10 threads of the 1st app. But I'm glad to be proven wrong, given the fact that there are other applications running in the machine and one application might not be always given 100% CPU time.

+1  A: 

The scheduling overhead could make the app with 10 threads slower than the one with 1 thread. You won't know for sure unless you create a test.

For some background on multithreading see http://en.wikipedia.org/wiki/Thread_(computer_science)

lothar
+1  A: 

This might very well depend on the operating system scheduler. For example, back in single-thread days the scheduler knew only about processes, and had measures like "niceness" to figure out how much to allocate.

In multithreaded code, there is probably a way in which one process that has 100 threads doesn't get 99% of the CPU time if there's another process that has a single thread. On the other hand, if you have only two processes and one of them is multithreaded I would suspect that the OS may give it more overall time. However, AFAIK nothing is really guaranteed.

Switching costs between threads in the same process may be cheaper than switching between processes (e.g., due to cache behavior).

Uri
Just tried this on my 2-core machine: run 1 Java app with 1 thread running a while(true); loop - 50% of total CPU time is occupied, the other free; run another with 100 threads doing the same thing, now the 2nd app occupies 99% of total CPU leaving only 1% for the 1st...
Buu Nguyen
(cont.) I don't have a 1-core machine to test now, but the above experiment seems to be congruent with your opinion in 2nd paragraph: the app with multiple threads may end up receiving more CPU time.
Buu Nguyen
A: 

One thing you must consider is wait time on the other end of the transaction. Having multiple threads will allow you to be waiting for a response on one while preparing the next transaction on the next. At least that's how I understand it. So I think a few threads will turn out better than one.

On the other hand you must consider the overhead involved with dealing on multiple threads. The details of the application are important part of the consideration here.

Joe Philllips
+3  A: 

There's always some overhead involved in task switching, so if the threads aren't blocking on anything, fewer threads is generally better. Also, if the threads aren't executing the same part of code, you'll get some cache flushing each time you swtich.

On the other hand, the difference might not be measurable.

Mark Ransom
Thanks for bringing context switch and cache hit into attention. Very good point.
Buu Nguyen
+1  A: 

Interesting question.

I wrote a sample program that does just this. It has a class that will go do some processor intensive work, then return. I specify the total number of threads I want to run, and the total number of times I want the work to run. The program will then equally divide the work between all the threads (if there's only one thread, it just gets it all) and start them all up.

I ran this on a single proc VM since I could find a real computer with only 1 processor in it anymore.

Run independently:

1 Thread    5000 Work Units - 50.4365sec
10 Threads  5000 Work Units - 49.7762sec

This seems to show that on a one proc PC, with lots of threads that are doing processor intensive work, windows is smart enough not to rapidly switch them back and fourth, and they take about the same amount of time.

Run together (or as close as I could get to pushing enter at the same time):

1 Thread    5000 Work Units - 99.5112sec
10 Threads  5000 Work Units - 56.8777sec

This is the meat of the question. When you run 10 threads + 1 thread, they all seem to be scheduled equally. The 10 threads each took 1/10th longer (because there was an 11th thread running) while the other thread took almost twice its time (really, it got 1/10th of its work done in the first 56sec, then did the other 9/10ths in the next 43sec...which is about right).

The result: Window's scheduler is fair on a thread level, but not on a process level. If you make a lot of threads, it you can leave the other processes that weren't smart enought to make lots of threads high and dry. Or just do it right and us a thread pool :-)

If you're interested in trying it for yourself, you can find my code: http://teeks99.com/ThreadWorkTest.zip

teeks99
I had the same observation in the comments to Uri, although on a duo-core. Thanks for the nice analysis.
Buu Nguyen