views:

49

answers:

3

I've found places on the web such as http://www.chinesetopinyin.com/ that convert Chinese characters to pinyin (romanization). Does anyone know how to do this, or have a database that can be parsed?

EDIT: I'm using C# but would actually prefer a database/flatfile.

A: 

possible solution using Python:

I think that Unicode database contains pinyin romanizations for chinese characters, but these are not included in unicodedata module data.

however, you can use some external libraries, like cjklib, example:

# coding: UTF-8
import cjklib
from cjklib.characterlookup import CharacterLookup

c = u'好'

cjk = CharacterLookup('T')
readings = cjk.getReadingForCharacter(c, 'Pinyin')
for r in readings:
    print r

output:

hāo
hǎo
hào

UPDATE

cjklib comes with an standalone cjknife utility, which micht help. some usage is described here

mykhal
.. and if you want ascii-only or numeric representation, you may find how to do it in documentation, or you can pick the first pinyin and remove the accents: http://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string
mykhal
A: 

If you use java, you can use pinyin4j.

http://pinyin4j.sourceforge.net/

imcaptor
A: 

Okay, first I used my question here to get the unicode:

http://stackoverflow.com/questions/3571563/converting-chinese-character-to-unicode

Then took a file like this to convert it: http://www.ic.unicamp.br/~stolfi/voynich/Notes/061/uc-to-py.tbl

Mass