views:

743

answers:

2

Is it possible to get the list of Full Text Search noise/stop words from SQL Server 2005 by querying the database?

I am aware that the noise words are in a text file ~/FTData/noiseEng.txt but this file is not accessible to our application.

I've look at the sys.fulltext_* tables but these don't seem to have the words.

A: 

It appears that this is not possible in SQL 2005 but is in SQL Server 2008.

Advanced Queries for Using SQL Server 2008 Full Text Search StopWords / StopLists

This next query gets a list of all of the stopwords that ship with SQL Server 2008. This is a nice improvement, you can not do this in SQL Server 2005.

Stopwords and Stoplists - SQL Server 2008

SQL Server 2005 noise words have been replaced by stopwords. When a database is upgraded to SQL Server 2008 from a previous release, the noise-word files are no longer used in SQL Server 2008. However, the noise-word files are stored in the FTDATA\ FTNoiseThesaurusBak folder, and you can use them later when updating or building the corresponding SQL Server 2008 stoplists. For information about upgrading noise-word files to stoplists, see Full-Text Search Upgrade.

Damien McGivern
A: 

I just copy the noise words file from \Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData into my app, and use it to strip noise words.

    Public Function StripNoiseWords(ByVal s As String) As String
        Dim NoiseWords As String = ReadFile("/Standard/Core/Config/noiseENU.txt").Trim
        Dim NoiseWordsRegex As String = Regex.Replace(NoiseWords, "\s+", "|") ' about|after|all|also etc.
        NoiseWordsRegex = String.Format("\s?\b(?:{0})\b\s?", NoiseWordsRegex)
        Dim Result As String = Regex.Replace(s, NoiseWordsRegex, " ", RegexOptions.IgnoreCase) ' replace each noise word with a space
        Result = Regex.Replace(Result, "\s+", " ") ' eliminate any multiple spaces
        Return Result
    End Function
Herb Caudill
Damien McGivern