views:

334

answers:

1

I need to be able to detect Japanese characters in a Java string.

Currently I'm getting the UnicodeBlock and checking to see if it's equal to Character.UnicodeBlock.KATAKANA or Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS, but I'm not 100% that's going to cover everything.

Any suggestions?

A: 

According regular-expressions.info, Japanese isn't made of one script: "There is no Japanese Unicode script. Instead, Unicode offers the Hiragana, Katakana, Han and Latin scripts that Japanese documents are usually composed of."

In which case, this regex should do the trick:

yourString.matches("[\\p{Hiragana}\\p{Katakana}\\p{Han}\\p{Latin}]*+")
Bart Kiers
Sorry, I wasn't precise enough ... I want to detect Japanese CHARACTERS in a string, not the character set name.
david
Including Latin will match most European languages as well, which I don't think is what the OP wants to check for (although Japanese is sometimes written with Roman characters as well).
Kathy Van Stone
Han are Chinese characters as well, but I believe you do want to add Hiragana.
Kathy Van Stone