views:

63

answers:

2

Could anyone tell me if SQL Server 2008 has a way to prevent keywords from being indexed that aren't really relevant to the types of searches that will be performed?

For example, we have the IFilters for PDF and Word hooked in and our documents are being indexed properly as far as I can tell. These documents, however, have lots of numeric values in them that people won't really be searching for or bring back meaningful results. These are still being indexed and creating lots of entries in the full text catalog. Basically we are trying to optimize our search engine in any way we can and assumed all these unnecessary entries couldn't be helping performance. I want my catalog to consist of alphabetic keywords only. The current iFilters work better than I would be able to write in the time I have but it just has more than I need.

This is an example of some of the terms from sys.dm_fts_index_keywords_by_document that I want out:

$1,000, $100, $250, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 129, 13.1, 14, 14.12, 145, 15, 16.2, 16.4, 18, 18.1, 18.2, 18.3, 18.4, 18.5

These are some examples from the same management view that I think are desirable for keeping and searching on:

above, accordingly, accounts, add, addition, additional, additive

Any help would be greatly appreciated!

A: 

Not sure about SQL Server 2008, but in 2000 and 2005 you could edit the noise files. See here and here.

davek
It was important to my question that the solution pertain to SQL Server 2008 however I'm glad you pointed this out for those who are still on SQL Server 2005.
Scott
A: 

See here: Stopwords and Stoplists.

The syntax is:

CREATE FULLTEXT STOPLIST MyList [FROM SYSTEM STOPLIST]

ALTER FULLTEXT STOPLIST MyList ADD 'above' LANGUAGE 'English'
ALTER FULLTEXT STOPLIST MyList ADD 'accordingly' LANGUAGE 'English'

And so on.

You can also manage all of this through SSMS - it's in [Your database] > Storage > Full Text Stoplists.

Aaronaught
Thanks for this. My understanding of stopwords was incorrect. I was thinking it only prevented the query from returning any results for those terms but glad to see I was wrong. I'll proceed with this then.
Scott