views:

298

answers:

3

SQL Server Full Text Search uses language specific Word Breakers.

For the German language this is used to break/split words including compound words. However, it appears not all known compound words are included in the Word Breaker. I would like to know if a list is available of the words the Word Breaker does know about.

A: 

have you read this page, somewhere in microsoft msdn says how to see the list of words. The following links might help you.

Word Breakers and Stemmers

sp_help_fulltext_system_components

Good luck with that...

Alan FL
There does not appear to be any official documentation on what words the breaker understands.
Coolcoder
A: 

in sql server 2008 this works... the language_id i put here is for german. I wanted to see the same thing but for spanish.

SELECT * FROM sys.fulltext_system_stopwords
WHERE language_id = 1031

edit: in sql server 2005 the words are stored here "$SQL_Server_Install_Path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\", If you edit the noise-word file, you have to repopulate the full-text.

Alan FL
Stop words are the new "noise words" in 2008. Effectively , these are the words that are excluded from full text search. I want to know what are the words Full Text knows how to break up.
Coolcoder
Specifically, in German, they have compound words - the Word Breaker appears to break some words but not others. So I would like to know which words it "knows" about.
Coolcoder
A: 

The answer is there is no answer. According to Microsoft , the words are not stored - they use a formula to "break" them. This will never be 100% accurate so i will just have to live with this fact.

Coolcoder