views:

1536

answers:

9

When creating an FixedThreadPool Executor object in Java you need to pass an argument describing the number of threads that the Executor can execute concurrently. I'm building a service class that's responsibility is to process a large collections of phone numbers. For each phone number I need to execute web service (that's my bottleneck) and then save response in a hashmap.

To make this bottleneck less harmful to the performance of my service I've decided to create Worker class which fetches unprocessed elements and processes them. Worker class implements Runnable interface and I run Workers using Executor.

The number of Workers that can be run in the same time depends on the size of Executor FixedThreadPool. What is the safe size for a ThreadPool? What can happen when I create FixedTheradPool with some big number as an argument?

+1  A: 

I have read somewhere that the optimal number of threads is the number of cores * 25. It seems like .NET uses this as default for the ThreadPool. However if you have large numbers of web service calls you'd better use a single thread and check a list of web service calls for a response. When the response has arrived just process the entry and remove it from the list.

Stilgar
It'll gather more responsability that his program needs, what he needs is the more throughput for his process.
Kamia
Kamia
+4  A: 

If each worker thread needs to make a web service call, then the number of threads in your pool should be influenced strongly by how many simultaneous requests your web service can handle. Any more threads than that will do nothing more than overwhelm the web service.

skaffman
The http behavior is like his FixedThreadPool, no need to worry about calling and get responses. The major problem there is the size of the mass hes processing, the memory avail. to heap and the speed of the machine taking the job.
Kamia
+2  A: 

If you have dev access to the web service, consider creating a batch function to check multiple phone numbers on one call.

In newer .NET there is a ThreadPool which can grow and shrink based on its own performance profile. Unfortunately, Java's version is either Fixed, or grows up to a limit based on the incoming work.

We had once similar concerns. Our solution was to allow the customer ajdust the pool size and tune the performance as he pleases.

There can be some network and data properties considered for the I/O operation pool sizing: network bandwith, message sizes, processing time and style of the web service, number of local cores.

kd304
+1  A: 

If each computation is equivalent to a call to a web service, then you should consider how much load you are putting on that service/how many concurrent connections that service will tolerate or would be allowed by the services' owners. Most publicly accessible services would expect only one such connection from any single user at a time. If possible, contact the services' owners for their usage policies. The number of such connections will determine the number of threads you may use.

TokenMacGuy
A: 

Don't forget that each thread you create will also make demands on memory for its stack size. So creating a pool of threads will impact the memory footprint of your process (note that some pools don't create the threads until they're actually required, so at startup you won't see any memory increase).

This stack size is configurable via -Xss (similar to -Xmx etc.). I believe the default is 512Kb per thread. At the moment I can't find any authoritative to confirm that.

Brian Agnew
Yup, on Linux, each thread gets its own stack with a default size of 512kB.
Kamia
I couldn't find an up-to-date reference detailing this for multiple platforms. If you can find one I'll change the answer appropriately.
Brian Agnew
A: 

I wonder if you'd be better off using NIO rather than threads, since your limiting factor will be web service server + network bottleneck, not client CPU.

Otherwise, at most you should not exceed the number of concurrent connections that your web service can support.

ykaganovich
A: 

If you are doing heavy computation say for parallel array manipulations then the rule of thumb is having the number of threads for the number of processors.

A: 

Let's assume that the web service is infinitely scalable and that nobody is going to care that you are spamming it with requests. Let's also assume that the web service responses are in the 1 second range while the local processing time is 5 milliseconds.

Throughput is maximized when you have the same amount of busy threads as processing cores.

Under these assumptions you are not going to be able to maximize throughput on a multi-core processor for any sane size of thread pool. To achieve maximum transactions per second you have to break the thread per connection model. Look for nonblocking I/O (NIO) mentioned previously or a Java implementation of the Asynchronous Completion Token pattern (IO Completion in Windows).

Note that stack memory that are reserved for every created thread is actually just reserved address space, not actual allocated or committed memory. As the stack tries to grow exceptions are thrown which results in stack memory getting committed on demand. The consequence is that it is only really relevant for 32-bit memory managers. For 64-bit memory, you have a huge address space even though you only back a small part of that space with physical memory. At least, this is how I understand Windows works, I'm not sure about the Unix world.

Hans Malherbe
+2  A: 

Something that could be considered is looking at

Runtime.getRuntime().availableProcessors()

which gives some direction on how many threads that would make sense for the system.

Lars Andren