How has SO implemented the tagged search? Is it using Lucene or any other open-source search engine library for tagged searching?
What is the best way to search document (PDF, XML, HTML, MS Word) or database?
How has SO implemented the tagged search? Is it using Lucene or any other open-source search engine library for tagged searching?
What is the best way to search document (PDF, XML, HTML, MS Word) or database?
So, yes, it is using Lucene.NET, though I'm not sure exactly how. The "best" way is a whole 'nother story.
Searching tags is very different than searching text. A tagged search is searching for an association where questions are all associated with a particular tag. This can be implemented with a full-text engine where the tags are all appended in a single large entry, but a relational database will probably be best in this situation (assuming the tagged data is in a relational database to start with).
For searching other documents like PDF, XLS, HTML, then you need full text like Lucene. You'll need a parser that can extract just the relevant text from each source (i.e., separate text from markup).
The last time this was discussed (on the podcast) it was mentioned that Stackoverflow uses SQL Server's full-text search feature, not Lucene.
SO doesn't use Lucene.
If you want to index documents and are running Windows, then IFilters would be my first choice.