ansaurus

Question

Recognizing a character to be Chinese and get Chinese "pinyin" phonetics from simplified characters?

Answer 1

+2 A:

A)
Yes. All characters represented in unicode have a unique numeric index called a codepoint.

If you know the range of codepoints for simplified Chinese and you know how to get the unicode codepoint of a given character, a simple comparison will tell you if the given character is within the simplified Chinese range.

An existing question has a solution for getting the unicode codepoint for a character in PHP:
How to get code point number for a given character in a utf-8 string?

In Java, the static java.lang.Character::codePointAt() method will give you what you need.

B)
Converting a simplified Chinese character, or string, to Pinyin would most likely require some form of map with the unicode code point as the key and the corresponding pinyin as the value.

An example of this in PHP is shown at http://kingphp.com/108.html.

A simple Google search for [java pinyin] reveals a range of options, two of which being Chinese to pinyin libraries at http://kiang.org/jordan/software/pinyinime/ and http://pinyin4j.sourceforge.net/.

Jon Cram 2010-06-29 18:46:48

Thanls for all that, I'll take it from here ;) was googling for pinyin php and the results weren't that great, however just added the java tag because I just started learning so I didn't think to google it.

Moak 2010-06-30 05:00:40

Answer 2

A:

If you are using utf-8 to interpret your files and calls to the DB, i guess a simple

$new_text = preg_replace(array('/你好/',...), array('nǐhǎo',...), $old_text);

should do the trick.

Where are you getting your string from?

misterte 2010-06-29 18:47:06

sorry if it was unclear, I need the pinyin from any Chinese characters. In this case to translate names.

Moak 2010-06-30 04:35:11

ansaurus

tags:

views:

answers:

Recognizing a character to be Chinese and get Chinese "pinyin" phonetics from simplified characters?

related questions