views:

43

answers:

1

Recently, I try to internationalize an application to Chinese speaking country.

I realize there are wide variety of encoding methods for Chinese character : Guobiao, Big5, Unicode, HZ

Whenever user input some text, my Java application need to know what kind of input encoding method the users are using, in order for my Java application to convert the input, into a processable data.

I feel that, it is not reliable for me to make assumption on their input encoding method, based on their OS. This is because when someone is using OS with China locale, JVM will by default using Guobiao encoding. However, user may use Big5 inputting tool, to key in Big5 encoding characters.

I was wondering what is the reliable method you all use, in order to detect the encoding type of user input?

+1  A: 

For actual user input, you never have to detect it. It is defined by the environment.

On Windows, for a UNICODE application, the API will deliver UTF-16. For an MBCS application, it will deliver the current code page, and there's an API to tell you what that is.

On Linux, the locale determines the encoding of input as delivered to APIs.

Since you say you are in Java, you really don't need to care. All Java UI programs will deliver either char or String values, and those are always, immutably, in Unicode.

bmargulies
When JVM run in a China locale computer, it is by default expecting Guobiao encoding. But users may provide input encoded in Big5. How does my Java application knows that is Big5 input and not Guobiao input?
Yan Cheng CHEOK
Because file.encoding is set to Big5.
bmargulies
Users cannot 'provide input in Big5'. Users hit keys. Those keys are interpreted by an IME. The IME has to deliver Unicode to Java.
bmargulies