views:

1640

answers:

6

Let's say I have a 4-core CPU, and I want to run some process in the minimum amount of time. The process is ideally parallelizable, so I can run chunks of it on an infinite number of threads and each thread takes the same amount of time.

Since I have 4 cores, I don't expect any speedup by running more threads than cores, since the a single core is only capable of running a single thread a given moment. I don't know much about hardware, so this is only a guess.

Is there a benefit to running a parallelizable process on more threads than cores? In other words, will my process finish faster, slower, or in about the same amount of time if I run it using 4000 threads rather than 4 threads?

+1  A: 

The ideal is 1 thread per core, as long as none of the threads will block.

One case where this may not be true: there are other threads running on the core, in which case more threads may give your program a bigger slice of the execution time.

patros
It depends on if you want the users background processes to run like crap while your application is running then. For that matter you could just set a real-time priority for each thread and get the maximum amount of power. But users like multitasking.
Earlz
Well, we're dealing with a magical ideally parallelizable application. If I ever created such a thing I would feel entitled to hog the CPU as much as I want.
patros
+3  A: 

The actual performance will depend on how much voluntary yielding each thread will do. For example, if the threads do NO I/O at all and use no system services (i.e. they're 100% cpu-bound) then 1 thread per core is the optimal. If the threads do anything that requires waiting, then you'll have to experiment to determine the optimal number of threads. 4000 threads would incur significant scheduling overhead, so that's probably not optimal either.

Jim Garrison
+10  A: 

If your threads don't do I/O, synchronization, etc., and there's nothing else running, 1 thread per core will get you the best performance. However that very likely not the case. Adding more threads usually helps, but after some point, they cause some performance degradation.

Not long ago, I was doing performance testing on a 2 quad-core machine running an ASP.NET application on Mono under a pretty decent load. We played with the minimum and maximum number of threads and in the end we found out that for that particular application in that particular configuration the best throughput was somewhere between 36 and 40 threads. Anything outside those boundaries performed worse. Lesson learned? If I were you, I would test with different number of threads until you find the right number for your application.

One thing for sure: 4k threads will take longer. That's a lot of context switches.

Gonzalo
I think Gonzalo's answer is good. I'd just add that you should experiment and measure. Your program will differ from his, or mine, or anyone else's and only measurements of your own program's behaviour will answer your questions properly. The performance of parallel (or concurrent) programs is not an area where good conclusions can be drawn from first principles alone.
High Performance Mark
+1, +answer: it surprises me that having many more threads than cores results in better performance, although it makes some sense if more threads means larger chunk of time share compared to competing threads. It would be nice my application could detect differences in performance and automagically tune itself to the optimal number of threads.
Juliet
It shouldn't surprise you in a real world scenario. Threads block waiting for IO resources like disk access, network, etc. And also waiting for non IO resources like other threads to finish using shared variables. What you really want to achieve is the minimum number of threads such that at least one thread per core can always be running.
patros
1 thread per core is not the optimum. It needs to be slightly more, preferably twice that since this will allow another thread to run if a thread is temporarily blocked. Even if only on memory. This is more importnat if you have systems (P4,I7, Sun Rock etc) that feature SMT/HT)
Marco van de Voort
Hence the "That is very likely not the case" in my answer. Finding the right number depends on the application and the architecture it runs on.
Gonzalo
+2  A: 

4000 threads at one time is pretty high.

The answer is yes and no. If you are doing a lot of blocking I/O in each thread, then yes, you could show significant speedups doing up to probably 3 or 4 threads per logical core.

If you are not doing a lot of blocking things however, then the extra overhead with threading will just make it slower. So use a profiler and see where the bottlenecks are in each possibly parallel piece. If you are doing heavy computations, then more than 1 thread per CPU won't help. If you are doing a lot of memory transfer, it won't help either. If you are doing a lot of I/O though such as for disk access or internet access, then yes multiple threads will help up to a certain extent, or at the least make the application more responsive.

Earlz
A: 

Benchmark.

I'd start ramping up the number of threads for an application, starting at 1, and then go to something like 100, run three-five trials for each number of threads, and build yourself a graph of operation speed vs. number of threads.

You should that the four thread case is optimal, with slight rises in runtime after that, but maybe not. It may be that your application is bandwidth limited, ie, the dataset you're loading into memory is huge, you're getting lots of cache misses, etc, such that 2 threads are optimal.

You can't know until you test.

mmr
A: 

speaking from computation and memory bound point of view (scientific computing) 4000 threads will make application run really slow. Part of the problem is a very high overhead of context switching and most likely very poor memory locality.

But it also depends on your architecture. From where I heard Niagara processors are suppose to be able to handle multiple threads on a single core using some kind of advanced pipelining technique. However I have no experience with those processors.

aaa

related questions