views:

439

answers:

4

How has SO implemented the tagged search? Is it using Lucene or any other open-source search engine library for tagged searching?

What is the best way to search document (PDF, XML, HTML, MS Word) or database?

A: 

So, yes, it is using Lucene.NET, though I'm not sure exactly how. The "best" way is a whole 'nother story.

Matthew Flaschen
A: 

Searching tags is very different than searching text. A tagged search is searching for an association where questions are all associated with a particular tag. This can be implemented with a full-text engine where the tags are all appended in a single large entry, but a relational database will probably be best in this situation (assuming the tagged data is in a relational database to start with).

For searching other documents like PDF, XLS, HTML, then you need full text like Lucene. You'll need a parser that can extract just the relevant text from each source (i.e., separate text from markup).

Sam
A: 

The last time this was discussed (on the podcast) it was mentioned that Stackoverflow uses SQL Server's full-text search feature, not Lucene.

Don
A: 

SO doesn't use Lucene.

If you want to index documents and are running Windows, then IFilters would be my first choice.

Michael Stum