views:

63

answers:

1

Hello everyone,

I have a nvarchar column which contains English and Japanese text. I want to make full text search on this column. When configure full text search, we need to specify language option for word breaker (e.g. using English work breaker or using Japanese word breaker). I am wondering in this case, what language should I assign? I am not sure if Japanese language work breaker works for English as well.

I am using SQL Server 2008 Enterprise.

thanks in advance, George

+1  A: 

You should choose the neutral-word-breaker if you have a column that has different language (especially western and non-western languages together) in it.

There are some other options such as

  • Seperating the columns by each language (as @Tony stated)
  • if your data is a plain text; converting it to the xml data type and add language tags that indicate the language which is used by the full text engine.
  • developing a custom word breaker (Although it is not a optimal solution)

Here is an article about best practices for choosing a language when creating a Full-Text index.

Added After Comments

It can be queried multiple columns in many ways depending on your use case. The easiest way is using the CONTAINS predicate to query multiple columns by specifying a list of columns to search as shown below;

SELECT Name, Color FROM Production.Product
WHERE CONTAINS((Name, Color), 'Red');

Another solution may be using a language indicator column that can be used in the CASE/IF statements which helps you conditionally split the query by language. You can also use the DATALENGTH() TSQL function to check whether it is empty and decide which column to choose.

orka
Thanks orka! How to achieve better work breaking and more accurate full text search performance? Using neutral work breaker or using separate column for different languages?
George2
I think it would be better if you separate columns by languages for the sake of accurate results and performance.
orka
Thanks orka, if I separate columns by language, for example, an English column and a Japanese column. And I will use English word breaker on the English column and using Japanese word breaker on the Japanese column. So, either English or Japanese column is empty since a row is in a specific language, can not be in two languages (e.g. Japanese row has an empty English column). But my query condition is, return any matched rows either English or Japanese column matches input query. I do not know how to write such kinds of a full text query, any ideas?
George2
I have updated the answer after your last comment.
orka
Thanks, question answered!
George2