Classically the number of reasonable threads is depending on the number of execution units, the ratio of IO to computation and the available memory.
Number of Execution Units (XU
)
That counts how many threads can be active at the same time. Depending on your computations that might or might not count stuff like hyperthreads -- mixed instruction workloads work better.
Ratio of IO to Computation (%IO
)
If the threads never wait for IO but always compute (%IO = 0), using more threads than XUs only increase the overhead of memory pressure and context switching. If the threads always wait for IO and never compute (%IO = 1) then using a variant of poll()
or select()
might be a good idea.
For all other situations XU / %IO
gives an approximation of how many threads are needed to fully use the available XUs.
Available Memory (Mem
)
This is more of a upper limit. Each thread uses a certain amount of system resources (MemUse
). Mem / MemUse
gives you an approximation of how many threads can be supported by the system.
Other Factors
The performance of the whole system can still be constrained by other factors even if you can guess or (better) measure the numbers above. For example, there might be another service running on the system, which uses some of the XUs and memory. Another problem is general available IO bandwidth (IOCap
). If you need less computing resources per transferred byte than your XUs provide, obviously you'll need to care less about using them completely and more about increasing IO throughput.
For more about this latter problem, see this Google Talk about the Roofline Model.