views:

514

answers:

2

By default, when one tells SQL Server (currently using 2008) to Full-Text index a column, it treats characters such as "@" and "." as work-breakers, similarly to " ".

I'd like to restrict the work-breaking characters to just be " ", so that "[email protected]" is treated as a single word.

It appears that one can choose a "Language for Word Breaker" against the indexed column - perhaps I need to set up a custom language?

Does anyone know how I can do this?

A: 

According to TechNet's article on SQL 2008 Full-Text Search:

well-known published interfaces provide the framework for Full-Text Engine extensibility. For more information, see the Microsoft Developer Network (MSDN) topics IFilter, IWordBreaker, and IStemmer.

So, at least according to this article, you can implement a custom IWordBreaker implementation (see http://blogs.msdn.com/michkap/archive/2005/03/14/395199.aspx for more info) and get SQL to use it.

What I haven't found so far is how to plug your custom word-breaker into SQL itself-- how to tell SQL to use your word-breaker. Sorry for the incomplete answer... hope I got you at least part of the way to a solution.

Justin Grant
+1  A: 

In order to make your word breaker fly with SQL Server you have to disable signature verification and add your COM CLSID to the registry. For more info check out this post: http://blogs.msdn.com/shajan/default.aspx It helped me a lot! However I never managed to create my own language so I simply hijacked an already existing one.

picknick