views:

181

answers:

3

I am currently working on building a proof of concept search solution for my company using Lucene and Hibernate Search. I have built individual components which work fine. I am now looking at creating a single API that would allow a user to get search results back from different sources (domain + data). What I would like to achieve is something like a search manager fires search requests to different search components asynchronously and when one set of results have been processed return that result to the user while processing the rest. Once the result of the search has been processed notify the client that there are more search results available.

I am wondering whether I have a search manager which creates separate threads to search individual search components and keeps a list of search results. Once the list is populated with one set return that back to the user. Any additional search results added would involve the search manager pushing the results to the user.

I am not looking for any code example (any would be appreciated) but I was wondering if I could get some guidance on how to tackle this problem. Do I use event processing technologies (GigaSpaces, Spring, JMS) or use standard Java concurrent libraries. What would be the effective way of managing the list and push the updated results.

Cheers

A: 

If you create a class for each type of search manager, e.g. Lucene, each of which implements an asynchronous search interface you should be able to make do with just the 'normal' Java stuff.

I'd be thinking on the following lines:

Create a thread safe collection (a set if you don't want duplicated search) with the right properties depending on whether you want ordering, are going to be randomly accessing data within it or just iterating through. Usual which data structure to use stuff.

Interface with a run search method that takes the collection as a parameter - possibly another method to check if the search has terminated. Or some other listener based means, whatever methods you like.

Implementations of that interface for each different search method. Each search method call creates its own thread on calling that runs the search, that thread places the search results into the supplied collection.

Search manager just iterates through all the known search engines (registered somewhere) and runs a search on each of them with the given query.

Hope that helps.

Chris
+1  A: 

This sounds like a perfect fit for the Executor Service abstraction in Java 5 and higher. You can submit tasks to a pool of executor threads and asynchronously poll for completion.

So in your case, you'd create each search as it's own task, and then poll those tasks for completion. Once they're done, grab the results and aggregate them for the user.

safetydan
A: 

I'd look into the Scatter-Gather pattern: broadcast the query asynchronously with JMS (or some other messaging technology), gather responses until a timeout is reached or a minimum number of search results has returned, then report the results so far to the end user.

The benefit of using JMS or similar is that you avoid tying up multiple threads waiting for reponses, and you have a mechanism for handling responses that arrive after the first result set is returned to the user.

You might want to look into Solr, an open source enterprise search server based on Lucene, and how they handle these issues.

markusk