views:

312

answers:

2

I am working on a multi-threaded application.

This application started out as a single thread and was expanded to multiple threads in order to realize a gain in performance.

I have a main thread which divides up the work into smaller chunks and offloads it to worker threads which process the chunks. This portion is controlled using a semaphore to allow only X number of worker threads at any one time. The worker threads produce chunks of data which are then stored in a queue or ring buffer which is then read by one saving thread. This thread is responsible for saving the chunks of data to the disk (sometimes across the local network).

My development machine is a Quad Core with 8GB of RAM. Running the application on my machine with 3 worker threads and 1 saver thread results in a steady flow of data over the network with the processors being utilized to an average 75%.

The second method of attacking this problem is where I add another set of threads between the worker threads and the saver thread (i.e. taking one task out of the current worker thread and add it to another thread) (I also add a queue for each of these threads) the application does not seem to gain any speed on my machine as there seems to be too much contention for resources RAM bus saturation and processor contention.

Through much experimentation with the number of threads and their priorities, I have found the ideal settings for my machine, for both the first and second methods of approaching this problem. Now the production machine will have 8 cores and 64GB of RAM. A much different environment and the application will have to be configured for it.

My question is, At what point have you created too many threads? Is it always a matter of experimenting to determine the ideal settings for a given machine? Is there a method of determing or observing if locking is taking too much away from the application?

(I'm not using a thread-pool because it does not fit my needs with long running threads being managed by semaphores and other locking mechanisms.)

+8  A: 

You've created too many threads when the overall performance of your application degrades or the impact on other applications running on that same box are negatively affected to an unacceptable level.

The point is that there is no absolute answer.

One application I've been working on uses a thread pool of 1000 threads and for what we're doing, that seems to be the right number. In one configuration we didn't limit it and it went up to 30,000+ and basically brought the machine to a grinding halt.

You basically have to performance test it and have enough monitoring/instrumentation to determine the overall throughput of your application, resource usage, thread utilization and know how idle threads were and how long work was waiting on queues to be picked up. You then tune as necessary.

One cautionary note: think very carefully before you add another layer of threads. As I'm sure you know, writing multithreaded code is hard. Try to keep it as simple as possible. Adding another layer is a risky step.

cletus
+1 the hill-climbing approach seems to be one of the most reliable for this problem
Rex M
+3  A: 

Nobody can give you a simple numerical answer because it's too dependent, not just on how many cores &c the machine has, but also on what other tasks (if any) said machine is supposed to be doing at the same time as your app, AND on what exactly your threads are doing too.

To give an example of the latter issue: I once I had a pretty simple "crawler" where a certain number of threads were devoted to HTTP GET pages I had determined I needed -- each thread spent most of its time blocked in socket calls to do the HTTP GET, and so to get pretty good performance I needed a large number of them (hundreds). Later I switched the underlying approach to use asynchronous network I/O instead of blocking sockets -- and suddenly each thread could easily have hundreds of URL "in flight", so having hundreds of such threads active would have overwhelmed the system, probably resulting in more sockets open than the system could handle (it wasn't a very large or generously configured server!-) resulting in a crash, or at least a terrible slow-down due to excessive swapping and so on.

So, even for totally I/O bound threads, the exact form of I/O they're using (blocking or async, for example) will have an enormous impact on what numbers of threads (or processes or any other such units) is optimal for a certain overall software task. Threads doing more CPU-bound work must obviously be calibrated (for maximum performance) on availability of cores, and of RAM in which the cores can work, but perhaps also of other resources (for example if some of your threads are able to use available GPUs or other dedicated processing unit to delegate some of their work).

In the end, you can make a reasonable ballpark estimate once you know all such parameters, but you could be off by a substantial factor -- so, benchmarking on a realistic workload with (say) half as many, and twice as many, threads as you guesstimate should be optimal, is an excellent way to spend some time and resources during the late stages of performance tuning at deployment. In general, performance behavior OFTEN proves surprising even to very experienced architects, developers and system administrators, so there is no really good replacement for the empirical data-driven approach of realistic benchmarks, careful measurement, and tuning accordingly. (Mind you, blind empiricism -- just trying to adapt to experimental observations without any sensible model to make sense of them -- is almost as bad as dogmatic and doctrinal approaches ignoring the data, but, that's another rant;-).

Alex Martelli