I have a method that takes an array of queries, and I need to run them against different search engine Web API's, such as Google's or Yahoo's. To optimize this process, I thought of creating a thread for each query, and then joining them all at the end, since my application can only continue after I have the results of every query. My structure is something like this:
public abstract class class Query extends Thread {
private String query;
public abstract Result[] querySearchEngine();
@Override
public void run() {
Result[] results = querySearchEngine(query);
Querier.addResults(results);
}
}
public class GoogleQuery extends Query {
public Result querySearchEngine(String query) {
// access google rest API
}
}
public class Querier {
/* Every class that implements Query fills this array */
private static ArrayList<Result> aggregatedResults;
public static void addResults(Result[]) { // add to aggregatedResults }
public static Result[] queryAll(Query[] queries) {
/* for each thread, start it, to aggregate results */
for (Query query : queries) {
query.start();
}
for (Query query : queries) {
query.join();
}
return aggregatedResults;
}
}
I don't really like the call to the static method, inside the run() method of the class Query, and I'd like to improve this code.
Recently, I have found that there's a "new" API in Java for doing concurrent jobs. Namely, the Callable interface which I believe is quite similar to Runnable, but can return results; the FutureTask and the ExecutorService. I was wondering if this new API is the one that should be used, and if they are more efficient than the classic ones.
After studying the "new" API, I came up with this new code (simplified version):
public abstract class Query implements Callable<Result[]> {
private final String query; // gets set in the constructor
public abstract Result[] querySearchEngine();
@Override
public Result[] call() {
return querySearchEngine(query);
}
}
public class Querier {
private ArrayList<Result> aggregatedResults;
public Result[] queryAll(Query[] queries) {
List<Future<Result[]>> futures = new ArrayList<Future<Result[]>>(queries.length);
final ExecutorService service = Executors.newFixedThreadPool(queries.length);
for (Query query : queries) {
futures.add(service.submit(query));
}
for (Future<Result[]> future : futures) {
aggregatedResults.add(future.get()); // get() is somewhat similar to join?
}
return aggregatedResults;
}
}
This way, I don't need to access static methods, and I think the code ends up getting better. I'm new to this concurrency API, and I'd like to know if there's something that can be improved on the above code, and if it's better than the first option (using the Thread class). There are some classes which I didn't explore, such as FutureTask, et cetera. I'd love to hear any advice.
Cheers.