views:

164

answers:

2

I have a typical enterprise/business application that I am developing that includes orders, salespersons, contacts, reference data, etc... There will be at least 100 or more users on the system at a time who are entering new data, changing data, etc. I need to provide search capability across the application for almost all tables.

One option is to do table queries such as "select * from salespersons where name contains 'searchtest'" or something similar. But I was wondering if I can use Lucene(.net) for this instead.

The main thing is that the search needs to reflect changes within a few seconds. So if a user enters a order, for example, and then immediately searches for it right after, then it needs to show up in the search list. (i.e., I can't have an index job every hour or half hour, or nightly, etc).

Is this something that would work well, or is there a better option?

+4  A: 

Yes, you certainly can use Lucene for this use case. I see some downsides:

  • You'll be replicating much of the information in the index (and you'll have implement something to keep the index and database in synch, which might not be trivial.)
  • You'll be hitting the database very often (or be delaying the inserts or just creating more load, depending on the way you choose to build it) to build this index.
  • Near realtime search is implemented only in the latest version of official Lucene. I'm not aware of the status of Lucene.net at this respect.

And a (big) upside:

  • Lucene will most likely outperform in both performance and results quality the database fulltext indexing.

The answers to this question might help http://stackoverflow.com/questions/1002255/lucene-net-best-practices

Vinko Vrsalovic
+2  A: 

I have implemented something almost identical to what you describe. The table to be indexed was huge (>5 hours to index with lucene) and the requirement was that the search would reflect changes in the DB within 5 minutes. There are two approaches I considered (I implemented the first one):

  • Index the table incrementally. Every row had a timestamp (last modified). Every 5 minutes a cron job would start a java process that read the rows modified since the last run, create a plain-text version of them and then update the lucene index. The incremental indexing would lock the table for 200-300 msces for about 1000 table rows. Obviously this depends on your system, database schema etc. However my experience is that it is definitely practical to implement this. And the search operations are orders-of-magnitude faster with lucene than with the query.

  • Use a dedicated thread to do the indexing. Whenever something changes in the DB, the code that actually runs the SQL query should send a message (through a LinkedBlockinQueue) to the thread that updates the lucene index. That way your updateDB() method at the main thread returns immediately after the DB has been updated and does not have to wait for the lucene indexing process, whereas the indexing happens as soon as possible (usually a few msecs later). One downside with this is that lucene uses locks stored in the disk. So I assume there is an overhead of updating the indexing for every single row (I haven't run any benchmark though). A workaround would be to keep a buffer of updates on your indexing thread and flush them to disk every few seconds (again, the performance of this depends on the ratio of updates vs searches on the index)

idrosid