views:

1038

answers:

4

I would like to implement a thread pool in Java, which can dynamically resize itself based on the computational and I/O behavior of the tasks submitted to it.

Practically, I want to achieve the same behavior as the new Thread Pool implementation in C# 4.0

Is there an implementation already or can I achieve this behavior by using mostly existing concurrency utilities (e.g. CachedThreadPool)?

The C# version does self instrumentation to achieve an optimal utilization. What self instrumentation is available in Java and what performance implications do the present?

Is it feasible to do a cooperative approach, where the task signals its intent (e.g. entering I/O intensive operation, entering CPU intensive operation phase)?

Any suggestions are welcome.

Edit Based on comments:

The target scenarios could be:

  • Local file crawling and processing
  • Web crawling
  • Multi-webservice access and aggregation

The problem of the CachedThreadPool is that it starts new threads when all existing threads are blocked - you need to set explicit bounds on it, but that's it.

For example, I have 100 web services to access in a row. If I create a 100 CTP, it will start 100 threads to perform the operation, and the ton of multiple I/O requests and data transfer will surely stumble upon each others feet. For a static test case I would be able to experiment and find out the optimal pool size, but I want it to be adaptively determined and applied in a way.

A: 

I think you should monitor CPU utilization, in a platform-specific manner. Find out how many CPUs/cores you have, and monitor the load. When you find that the load is low, and you still have more work, create new threads - but not more than x times num-cpus (say, x=2).

If you really want to consider IO threads also, try to find out what state each thread is in when your pool is exhausted, and deduct all waiting threads from the total number. One risk is that you exhaust memory by admitting too many tasks, though.

Martin v. Löwis
I guess this is one of my problems, howto instrument each of my threads (in a platform independent way) of the busy and blocking ratios.
kd304
I didn't mean to suggest that you instrument threads. Instead, use OS API to find out whether a thread is waiting, from outside of the thread.
Martin v. Löwis
That is called instrumentation I think.
kd304
No no no. Instrumentation is when you put code into the thread itself, i.e. more code into the code that runs in the thread. I would abstain from doing so, and use interrogation instead of instrumentation.
Martin v. Löwis
Thanks for the terminology clarification. My problem is with interrogation, that most standard API uses synchronized or you cannot inject a modified Lock to do the measurements.
kd304
I propose that you don't use any standard API, nor any injection. On Linux, read /proc/<pid>/task/<tid>/stat to find out whether a thread is running or not.
Martin v. Löwis
+1  A: 

The example given is

Result[] a = new Result[N];
for(int i=0;i<N;i++) {
    a[i] = compute(i);
}

In Java the way to paralellize this to every free core and have the work load distributed dynamically so it doesn't matter if one task takes longer than another.

// defined earlier
int procs = Runtime.getRuntime().availableProcessors();
ExecutorService service = Executors.newFixedThreadPool(proc);

// main loop.
Future<Result>[] f = new Future<Result>[N];
for(int i = 0; i < N; i++) {
    final int i2 = i;
    a[i] = service.submit(new Callable<Result>() {
        public Result call() {
            return compute(i2);
        }
    }
}
Result[] a = new Result[N];
for(int i = 0; i < N; i++) 
    a[i] = f[i].get();

This hasn't changed much in the last 5 years, so its not as cool as it was when it was first available. What Java really lacks is closures. You can use Groovy instead if that is really a problem.

Additional: If you cared about performance, rather than as an example, you would calculate Fibonacci in parallel because its a good example of a function which is faster if you calculate it single threaded.

One difference is that each thread pool only has one queue, so there is no need to steal work. This potentially means that you have more overhead per task. However, as long as your tasks typically take more than about 10 micro-seconds it won't matter.

Peter Lawrey
Actually, your answer has nothing to do with my question. I know how to do it with fixed and cached pools, but I want something in between - something which takes the CPU utilization, I/O blocking and I/O transfer properties into account and 'schedules' things to achieve maximum CPU and maximum I/O utilization through proper interleaving for example.
kd304
You can do this by overriding beforeTask and afterTask or having another thread which monitors the queue length and the number of threads in the pool. However after trying these approaches, I would just suggest you increases the number of available threads and let the OS scheduler work out the best way to allocate the tasks (as it knows about other applications, cpu usage, io usage etc)
Peter Lawrey
In summary, what you suggest is what the OS schedule is designed to do and it does it well on windows, linux and solaris. You would only try to write your own if you didn't trust the OS scheduler. So my answer is the answer to your question. i.e. don't worry about it, the OS works!
Peter Lawrey
I agree, the schedulers are good, especially W7's, but I don't know whether the OS scheduler's objective function is comparable to my combined objective function: do not overwhelm the I/O with lots simultaneous transfers and do not overwhelm the threads themselves with constant wake/blocks and slow down computation intensive parts with more threads than cores. I guess I have to try out some of my current ideas and we will see. I guess also I will offer a bounty soon.
kd304
What I can suggest is that CPU intensive tasks should have the same number of threads as cores (see above), Disk IO intensive tasks should have the same number of threads as you have drives (typically 1), for Network IO you should allow connections until the response time is too slow (this is subjective)
Peter Lawrey
+2  A: 

Consider creating a Map where the key is the bottleneck resource.

Each thread submitted to the pool will submit a resource which is it's bottleneck, ie "CPU", "Network", "C:\" etc.

You could start by allowing only one thread per resource and then maybe slowly ramp up until work completion rate stops increasing. Things like CPU could have a floor of the core count.

Charlie
If you going to do something like this, make sure you use a synchronized implementation of Map.
James McMahon
A: 

Let me present an alternative approach. Having a single thread pool is a nice abstraction, but it's not very performant, especially when the jobs are very IO-bound - then there's no good way to tune it, it's tempting to blow up the pool size to maximize IO throughput but you suffer from too many thread switches, etc.

Instead I'd suggest looking at the architecture of the Apache MINA framework for inspiration. (http://mina.apache.org/) It's a high-performance web framework - they describe it as a server framework, but I think their architecture works well for inverse scenarios as well, like spidering and multi-server clients. (Actually, you might even be able to use it out-of-the-box for your project.)

They use the Java NIO (non-blocking I/O) libraries for all IO operations, and divide up the work into two thread pools: a small and fast set of socket threads, and a larger and slower set of business logic threads. So the layers look as follows:

  • On the network end, a large set of NIO channels, each with a message buffer
  • A small pool of socket threads, which go through the channel list round-robin. Their only job is to check the socket, and move any data out into the message buffer - and if the message is done, close it out and transfer to the job queue. These guys are fast, because they just push bits around, and skip any sockets that are blocked on IO.
  • A single job queue that serializes all messages
  • A large pool of processing threads, which pull messages off the queue, parse them, and do whatever processing is required.

This makes for very good performance - IO is separated out into its own layer, and you can tune the socket thread pool to maximize IO throughput, and separately tune the processing thread pool to control CPU/resource utilization.

doihaveto
Thanks, I will have a look at it shortly.
kd304