views:

123

answers:

2

Hello,

I've got an input box that allows UTF8 characters -- can I detect whether the characters are in Chinese, Japanese, or Korean programmatically (part of some Unicode range, perhaps)? I would change search methods depending on if MySQL's fulltext searching would work (it won't work for CJK characters).

Thanks!

+1  A: 

Do you want to detect whether a character is a (Chinese or Japanese or Korean) character? Or do you want to tell Chinese characters apart from Japanese characters? The former is easy; the latter is in many cases impossible, due to Han Unification.

newacct
The former, thankfully.
Jasie
+2  A: 

CJK characters are restricted to certain Unicode Blocks. You need to check the characters if they are inside these blocks, and should consider surrogates (32bit characters) too.

devio