Hi! I am doing a web-project based on asp.net mvc framework. As db I am using postgre SQL. The question is how to organize searching in my application. One option would be using of .net libraries such as lucene.net. Another option is to use the Postgre full text search. So what is the best option?
I haven't run postgres in production, but I have played with it on a test DB with pretty significant (I think) amounts of data. Indexing about 600,000 rows of text strings averaging 3 words creates a full text index that's 120MB. Queries are very fast against this index after the first one for each search term. It looks like the index for each term has to be pulled from disk into memory. I haven't yet found a way to pull in the whole index into memory at startup. Some of the initial slowness may be disk IO related since I am running on a single laptop HD. I am also not sure if a 120MB index would need 120MB of DB memory or more.
For a production app, we are using Lucene for Java and it is performing very well - subsecond responses with several GB of index data. The advantage of Lucene that I see is 1) that it's DB independent and 2) distributable. For #1, this may not be an issue, but it means that you can use the same indexing code no matter what your underlying DB is. For #2, it depends on how big the application would be. Lucene (and Hadoop especially) are designed to be multithreaded so you can store the index on a shared drive and have multiple machines running searches at once (note that indexing is still single threaded). Whether you want this or not depends on what your architecture is. Would you rather have 1 big DB or 1 small / medium DB plus a few smaller indexing servers supporting it?