views:

729

answers:

3

Using Lucene, one can retrieve the terms contained within in an index, i.e. the unique, stemmed words, excluding stop-words, that documents in the index contain. This is useful for generating autocomplete suggestions amongst other things. Is something similar possible with MS SQL Server full text indices?

A: 

This article proved useful for me, using SQL Server 2008 though, don't have any other versions to test on.

Phil Jenkins
Good article, nothing on retrieving the indexed terms though.
friism
This is exactly the opposite of what friism asked for - you gave him the list of stop words in SQL Server, which is what his tables WON'T contain.
Brent Ozar
+2  A: 

You can use the new system view in SQL Server 2008 to get you the terms and count of occurrences, is this what you want?

sys.dm_fts_index_keywords_by_document

You need to supply the db_id and object_id of the fulltext table. This is the MSDN link for this.

http://msdn.microsoft.com/en-us/library/cc280607.aspx

Coolcoder
A: 

I agree that this information (words in the index, stemmed words, etc.) is usefull - and if SQL Server is serious about offering a serch platform, this information needs to be exposed. It's really not available in previous versions, as far as I can tell. However, the game changes in SQL Server 2008.

SQL Server 2008 offers new dynamic management views that offer this metadata for full text. Pay particular note to sys.dm_fts_parser and sys.dm_fts_index_keywords.

The sys.dm_fts_parser view takes in a phrase, along with a couple of other parameters and outputs a table showing a row set, showing stemmed versions of the individual words after the word breaker has deemed them as separate words.

MSDN gives the example of this query against the view:

SELECT * FROM sys.dm_fts_parser (' "The Microsoft business analysis" ', 1033, 0, 0)

To get the keywords, you can use sys.dm_fts_index_keywords.

I hope that points you in the right direction. Cheers.

Newfave