views:

1017

answers:

2

Is Lucene.Net suitable as the search engine for frequently changing content?

Or more specificically, can anybody give a subjective opinion on how quickly lucene.net indexes can be updated. Any other approaches to searching frequently changing content would be great.

We’re developing a forum. Forum posts will be frequently added to the forum repository. We think we need these posts to be added to lucene index very quickly (<0.5s) to become available to search. There’ll be about 5E6 posts in the repository initially. Assume search engine running on non-exotic server (I know this is very vague!).

Other suggestions with regard to addressing the issue of searching frequently changing content appreciated. The forum posts need to be searchable on a variable number of named tags (tag name and value must match). A SQL based approach (based on Toxi schema) isn’t giving us the performance we’d like.

+7  A: 

Out forums (http://episteme.arstechnica.com) use Lucene as the search backend, so it's doable. Posts aren't indexed quite as quickly as you'd like, but we could solve that by beefing up the indexing hardware and using a smarter caching strategy.

The general answer to this question is: it depends what your write/update pattern is. Forums are relatively easy, since most content is new and existing content is updated less frequently.

For a forum, I'd recommend having an "archive" index and a "live" index. The live index might include posts from the last day, week, year, while the archive index will include a large body of posts that probably won't ever be touched again. So when someone creates a new post, it will initially be indexed in the live index. At a later time, some batch job would clear out the live index, and reindex everything into the archive.

Lucene's very good at querying across multiple indexes. You should abuse that ability. :)

MrKurt
+3  A: 

Lucene.Net is extremely fast, however there are many things that can slow down queries when used wrong. I strongly recommend reading the Lucene in Action book by Erik Hatcher and Otis Gospodnetić. It contains a very good chapter about performance testing and tuning.

Stefan Schultze