views:

546

answers:

4

I am working on a tutorial for my Java concurrency course. The objective is to use thread pools to compute prime numbers in parallel.

The design is based on the Sieve of Eratosthenes. It has an array of n bools, where n is the largest integer you are checking, and each element in the array represents one integer. True is prime, false is non prime, and the array is initially all true.

A thread pool is used with a fixed number of threads (we are supposed to experiment with the number of threads in the pool and observe the performance).

A thread is given a integer multiple to process. The thread then finds the first true element in the array that is not a multiple of thread's integer. The thread then creates a new thread on the thread pool which is given the found number.

After a new thread is formed, the existing thread then continues to set all multiples of it's integer in the array to false.

The main program thread starts the first thread with the integer '2', and then waits for all spawned threads to finish. It then spits out the prime numbers and the time taken to compute.

The issue I have is that the more threads there are in the thread pool, the slower it takes with 1 thread being the fastest. It should be getting faster not slower!

All the stuff on the internet about Java thread pools create n worker threads the main thread then wait for all threads to finish. The method I use is recursive as a worker can spawn more worker threads.

I would like to know what is going wrong, and if Java thread pools can be used recursively.

+3  A: 

How many processors are available on your system? If #threads > #processors, adding more threads is going to slow things down for a compute-bound task like this.

Remember no matter how many threads you start, they're still all sharing the same CPU(s). The more time the CPU spends switching between threads, the less time it can be doing actual work.

Also note that the cost of starting a thread is significant compared to the cost of checking a prime - you can probably do hundreds or thousands of multiplications in the time it takes to fire up 1 thread.

David Gelhar
My current computer has two cores, but the code was originally tested on a Intel Core i7 with 4 cores (8 virtual) and it had similar problems. ie 1 thread = 1s. 4 threads = 20s.
Leith
A: 

The key point of a thread pool is to keep a set of thread alive and re-use them to process tasks. Usually the pattern is to have a queue of tasks and randomly pick one thread from the pool to process it. If there is no free thread and the pool is full, just wait.

The problem you designed is not a good one to be solved by a thread pool, because you need threads to run in order. Correct me if I'm wrong here.

thread #1: set 2's multiple to false

thread #2: find 3, set 3's multiple to false

thread #3: find 5, set 5's multiple to false

thread #4: find 7, set 7's multiple to false

....

These threads need to be run in order and they're interleaving (how the runtime schedules them) matters.

For example, if thread #3 starts running before thread #1 sets "4" to false, it will find "4" and continue to reset 4's multiples. This ends up doing a lot of extra work, although the final result will be correct.

X. Ma
Thread 5 will never "find" 4 - its only purpose is to remove all multiples of 5. Once all of the worker threads have finished removing non-primes, the main thread will "find" the remaining numbers.
danben
If thread #3 needs to wait the threads that start before it to finish, we don't need multiple threads.The problem here is that thread #3 doesn't need to wait thread #1 to remove *all* of 2's multiples, but it needs to wait thread #1 to remove "enough" 2's multiples before it starts searching.
X. Ma
+2  A: 

Your solution may run slower as threads are added for some of following problems:

  • Thread creation overheads: creating a thread is expensive.

  • Processor contention: if there are more threads than there are processors to execute them, some of threads will be suspended waiting for a free processor. The result is that the average processing rate for each thread drops. Also, the OS then needs to time-slice the threads, and that takes away time that would otherwise be used for "real" work.

  • Virtual memory contention: each thread needs memory for its stack. If your machine doesn't have enough physical memory for the workload, each new thread stack increases virtual memory contention which results in paging which slows things down

  • Cache contention: each thread will (presumably) be scanning a different section of the array, resulting in memory cache misses. This slows down memory accesses.

  • Lock contention: if your threads are all reading and updating a shared array and using synchronized and one lock object to control access to the array, you could be suffering from lock contention. If a single lock object is used, each thread will spend most of its time waiting to acquire the lock. The net result is that the computation is effectively serialized, and the overall processing rate drops to the rate of a single processor / thread.

The first four problems are inherent to multi-threading, and there are no real solutions ... apart from not creating too many threads and reusing the ones that you have already created. However, there are a number of ways to attack the lock contention problem. For example,

  • Recode the application so that each thread scans for multiple integers, but in its own section of the array. This will eliminate lock contention on the arrays, though you will then need a way to tell each thread what to do, and that needs to be designed with contention in mind.
  • Create an array of locks for different regions of the array, and have the threads pick the lock to used based on the region of the array they are operating on. You would still get contention, but on average you should get less contention.
  • Design and implement a lockless solution. This would entail DEEP UNDERSTANDING of the Java memory model. And it would be very difficult to prove / demonstrate that a lockless solution does not contain subtle concurrency flaws.

Finally, recursive creation of threads is probably a mistake, since it will make it harder to implement thread reuse and the anti-lock-contention measures.

Stephen C
In short, You really have to test whether adding threads improves performance, don't assume adding complexity makes a better solution because very often, it doesn't.
Peter Lawrey
@Peter - I'd say that the point of the exercise is that you have to use the threads *the right way* to get the app's performance to scale up as you add processors.
Stephen C
A: 

Restructure your program to create a fixed ThreadPoolExecutor in advance. Make sure you call ThreadPoolExecutor#prestartAllCoreThreads(). Have your main method submit a task for the integer 2. Each task will submit another task. Since you are using a thread pool, you won't be creating and terminating a bunch of threads, but instead allowing the same threads to take on new tasks as they become available. This will reduce on overall execution overhead.

You should discover that in this case the optimum number of threads is equal to the number of processors (P) on the machine. It is often the case that the optimum number of threads is P+1. This is because P+1 minimizes overhead from context switching while also minimizing loss from idle/blocking time.

Tim Bender