ansaurus

Question

Multithread search operation (? Callable versus Runnable, FutureTask, Thread ?) in Java

Answer 1

+3 A:

As a futher improvement, you could look into using a CompletionService It decouples the order of submitting and retrieving, instead placing all the future results on a queue from which you take results in the order they are completed..

Tim 2009-07-19 15:10:58

Since the application can only continue in this case after *every* task is completed, a CompletionService might not be appropriate here.

Avi 2009-07-19 15:26:47

@Avi: I disagree, it's just not that nice as future.get().

kd304 2009-07-19 15:28:40

@kd304: What method of CompletionService would you use, to get all results of a set of tasks?

Avi 2009-07-19 15:40:34

Something like `excCmpSrv.take().get()`, where you have to be carefull not to take() if there aren't any submitted Futures left (it'll wait for a new one that doesn't come).. Using poll or counting the number of submitted Callables is a way of working around this

Tim 2009-07-19 16:36:26

Answer 2

+5 A:

Several problems with your code.

You should probably be using the ExecutorService.invokeAll() method. The cost of creating new threads and a new thread pool can be significant (though maybe not compared to calling external search engines). invokeAll() can manage the threads for you.
You probably don't want to mix arrays and generics.
You are calling aggregatedResults.add() instead of addAll().
You don't need to use member variables when they could be local to the queryAll() function call.

So, something like the following should work:

public abstract class Query implements Callable<List<Result>> {
    private final String query; // gets set in the constructor

    public abstract List<Result> querySearchEngine();
    @Override
    public List<Result> call() {
        return querySearchEngine(query);
    }
}

public class Querier {   
    private static final ExecutorService executor = Executors.newCachedThreadPool();

    public List<Result> queryAll(List<Query> queries) {
        List<Future<List<Result>>> futures = executor.submitAll(queries);
        List<Result> aggregatedResults = new ArrayList<Result>();
        for (Future<List<Result>> future : futures) {  
            aggregatedResults.addAll(future.get());  // get() is somewhat similar to join?
        }  
        return aggregatedResults;
    }
}

Avi 2009-07-19 15:17:38

Changing to cached thread pool might not be the best option, as your application is IO-bound, as most search engines are really fast and will respond promptly.

kd304 2009-07-19 15:27:23

@kd304: Indeed, the search engines that I'm using are quite fast (Google and Yahoo, currently). However, I'm using lots of queries, hence the need for concurrency. What is your advice on this ? From what I've read on the javadoc of the newCachedThreadPool method, it seems to fit my purposes. But then again, I'm quite new to this API.

JG 2009-07-19 15:39:36

@Avi: Thank you very much for the suggestions!

JG 2009-07-19 15:40:37

@JG: Hard to say, as there is no adaptive pool available in Java, which would adjust its size based on the I/O to CPU ratio. A heuristic approach would be to measure the wait to response, response delivery time and response processing time, then use a fixed pool size to interleave them. On my 100MBit/2 Core computer, the optimum performance is achieved by using size 10 pool for processing.

kd304 2009-07-19 16:57:07

Answer 3

+3 A:

Can I suggest you use Future.get() with a timeout ?

Otherwise it'll only take one search engine being unresponsive to bring everything to a halt (it doesn't even need to be a search engine problem if, say, you have a network issue at your end)

Brian Agnew 2009-07-19 16:23:38

Thanks. What is the typical timeout value that is used for this kind of operations?

JG 2009-07-19 16:31:49

I think you need to ask yourself how long you'd be prepared to wait :-) Make it configurable and set it to (say) 10x the normal response time.

Brian Agnew 2009-07-19 16:32:45

I think that the right layer in the code for the timeout is not Future.get(), it is the network (HTTP?) call to the search engine itself. If the search engine times out, better it should be caught there, and not tie up a thread which is no longer needed.

Avi 2009-07-20 09:25:05

That assumes (!) that you're talking HTTP. In the higher, more abstract areas of the code base I wouldn't necessarily make that assumption. However, I think you're right in that setting a timeout on the HTTP operations is always a good idea, and then throwing an appropriate exception. So I would set some timeout in *both* the Future.get() and the HTTP connection. Whether they're the same value is another matter.

Brian Agnew 2009-07-20 09:28:23

ansaurus

tags:

views:

answers:

Multithread search operation (? Callable versus Runnable, FutureTask, Thread ?) in Java

related questions