With stackoveflow.com in perspective (team of 2-3 engineers building a website project intended to scale) does it make sense to spend effort early in the process of development to build a search based on Lucene/Autonomy… as opposed to a database based full text search.
Pros/Cons:
With a mature Lucene implementation like nutch or autonomy, the cost of moving to Lucene (which is inevitable) at a later stage is negligible.
In large volumes adding additional index servers (say with nutch) to maintain the growing search index is relatively easy.
With a Lucene implementation I’ll mostly likely need an additional server to main the in-memory index (much early in the process of scaling).