views:

965

answers:

3

Does SQL Server's (2000) Soundex function work on Asian character sets? I used it in a query and it appears to have not worked properly but I realize that it could be because I don't know how to read Chinese...

Furthermore, are there any other languages where the function might have trouble working on? (Russian for example)

Thank you,
Frank

+1  A: 

By design it works best on English sentences using the ASCII character set. I have used it on a project in Romania where I replaced the Romanian special characters with corresponding ASCII characters that sound more or less the same. It is not perfect but in my case it was a lot better than nothing.

I think you will have no great success with applying SOUNDEX on Asian character sets.

Jonas Kongslund
+2  A: 

I know that soundex in older versions of SQLServer ignored any non-english characters. I believe it didn't even handle Latin-1, let alone anything more exotic.

I never dealt with soundex much in SQL2k, all I know for certain was that it does not handle Arabic correctly. This likely extends to other non-latin character sets as well.

In any case, a soundex based algorithm is unlikely to yield acceptable results for non-english languages even aside from character set issues. Soundex was specifically designed to handle the English pronunciation of names (mostly those of Western European origin) and does not function particularly well outside of that use. You would often be better off researching any of several variants of soundex or other unrelated phonetic similarity algorithms which are designed to address the language(s) in question.

Andrew Beyer
+1  A: 

Soundex is fairly specific to English - it may or may not work well on other languages. One example that happened in New Zealand was an attempt at patient name matching using Soundex. Unfortunately pacific island names did not work well with Soundex, in many cases hashing to the same small set of values. A different algorithm had to be used.

Your mileage may vary. On more recent versions of SQL Server you could write a CLR function to do some other computation.

ConcernedOfTunbridgeWells