I am working with SQL Server 2008. My task is to investigate the issue where FTS cannot find the right result for Thai.
First, I have the table which enables the FTS on the column 'ItemName' which is nvarchar. The Catalog is created with the Thai Language. Note that the Thai language is one of the languages that doesn't separate the word by spaces, so 'หลวง' 'พ่อ' 'โสธร' are written like this in a sentence: 'หลวงพ่อโสธร'
- In the table, there are many rows that include the word (โสธร); for example row#1 (ItemName: 'หลวงพ่อโสธร')
- On the webpage, I try to search for 'โสธร' but SQL Server cannot find it.
So I try to investigate it by trying the following query in SQL Server:
select * from sys.dm_fts_parser(N'"หลวงพ่อโสธร"', 1054, 0, 0)
...to see how the words are broken. The first one is the text to be broken. The second parameter is to specify that we're using Thai (WorkBreaker, so on). Here is the result:
- row#1
(display_item: 'ງลวง', source_item: 'หลวงพ่อโสธร')
- row#2
(display_item: 'พຝโส', source_item: 'หลวงพ่อโสธร')
- row#3
(display_item: 'ธร', source_item: 'หลวงพ่อโสธร')
Notice that the first and second row display the wrong display_item 'ງ' in the 'ງลวง' isn't even Thai characters. 'ຝ' in 'พຝโส' is not a Thai character either.
So the question is where did those alien characters come from? I guess this why I cannot search for 'โสธร' because the word breaker is broken and keeping the wrong character in the indexes.
Please help!