views:

164

answers:

2

Is there a way to query a full text index to help determine additional noise words? I would like to add some custom noise words and wondered if theres a way to analyse the index to help determine suggestions.

A: 

As simple as in

http://arcanecode.com/2008/05/29/creating-and-customizing-noise-words-in-sql-server-2005-full-text-search/

where this is explained (how to do it). Coming up with proper ones, though, is hard.

TomTom
I see that this article shows how to edit the noise words file, thats great, but I would like to know any additional terms that are bloating the index, specific to the indexed content.
SteadyEddi
A: 

I decided to look into lucene.net because I wasn't happy with the relevance calculations in sql server full text indexing.

I managed to figure out how to index all the content pretty quickly and then used Luke to find noise words. I have now edited the sql server noise files based on this analysis. Now I have a search solution that works reasonably well using sql server full text indexing, but I plan to move to lucene.net in the future.

Using sql server full text indexing as a base, I developed a domain centric approach to finding relevant content using tool I understood. After some serious thinking and testing, I used many other measures to determine the relevance of a search result other than what is provided by analysing text content for term frequency and word distance. SQL Server full text indexing provided me a great start, and now I have a strategy I can express using lucene that will work very well.

It would have taken me a whole lot longer to understand lucene, and develop a strategy for the search. If anyone out there is still reading this, use full text indexing for testing your idea and then move to lucene once you have a strategy you know will work for your domain.

SteadyEddi