views:

877

answers:

5

I am looking into mechanisms for better search capabilities against our database. It is currently a huge bottleneck (causing long-lasting queries that are hurting our database performance).

My boss wanted me to look into Solr, but on closer inspection, it seems we actually want some kind of DB integration mechanism with Lucene itself.

From the Lucene FAQ, they recommend Hibernate Search, Compass, and DBSight.

As a background of our current technology stack, we are using straight JSPs on Tomcat, no Hibernate, no other frameworks on top of it... just straight Java, JSP, and JDBC against a DB2 database.

Given that, it seems Hibernate Search might be a bit more difficult to integrate into our system, though it might be nice to have the option of using Hibernate after such an integration.

Does anyone have any experiences they can share with using one of these tools (or other similar Lucene based solutions) that might help in picking the right tool?

It needs to be a FOSS solution, and ideally will manage updating Lucene with changes from the database automagicly (though efficiently), without extra effort to notify the tool when changes have been made (otherwise, it seems rolling my own Lucene solution would be just as good). Also, we have multiple application servers with just 1 database (+failover), so it would be nice if it is easy to use the solution from all application servers seamlessly.

I am continuing to inspect the options now, but it would be really helpful to utilize other people's experiences.

A: 

For a pure performance boost with searching Lucene will certainly help out a lot. Only index what you care about/need and you should be good. You could use Hibernate or some other piece if you like but I don't think it is required.

scunliffe
Thanks for the response! Unfortunately, I would really like to see what these frameworks can provide so I don't have to roll my own full-fledged solution.
Mike Stone
A: 

Well, it seems DBSight doesn't meet the FOSS requirement, so unless it is an absolutely stellar solution, it is not an option for me right now...

Mike Stone
+2  A: 

I have had good experiences with Compass. It has really good integration with hibernate and can mirror data changes made through hibernate and jdbc directly to the Lucene indexes though its GPS devices http://www.compass-project.org/docs/1.2.2/reference/html/gps-jdbc.html.

Maintaining the Lucene indexes on all your application servers may be an issue. If you have multiple App servers updating the db, then you may hit some issues with keeping the index in sync with all the changes. Compass may have an alternate mechanism for handling this now.

The Alfresco Project (CMS) also uses Lucene and have a mechanism for replicating Lucene index changes between servers that may be useful in handling these issues.

We started using Compass before Hibernate Search was really off the ground so I cannot offer any comparison with it.

+2  A: 

When you say "search against a DB", what do you mean?

Relational databases and information retrieval systems use very different approaches for good reason. What kind of data are you searching? What kind of queries do you perform?

If I were going to implement an inverted index on top of a database, as Compass does, I would not use their approach, which is to implement Lucene's Directory abstraction with BLOBs. Rather, I'd implement Lucene's IndexReader abstraction.

Relational databases are quite capable of maintaining indexes. The value that Lucene brings in this context is its analysis capabilities, which are most useful for unstructured text records. A good approach would leverage the strengths of each tool.

As updates are made to the index, Lucene creates more segments (additional files or BLOBs), which degrade performance until a costly "optimize" procedure is used. Most databases will amortize this cost over each index update, giving you more stable performance.

erickson
A: 

LuSql http://code.google.com/p/lusql/ allows you to load the contents of a JDBC-accessible database into Lucene, making it searchable. It is highly optimized and multi-threaded. I am the author of LuSql and will be coming out with a new version (re-architected with a new plugable architecture) in the next month.

Glen