ansaurus

Question

Problems related to changing the encoding of a Java file

Answer 1

+1 A:

If the eclipse setting ever gets lost, or the program is built outside eclipse, the cyrillic characters could get corrupted without anyone noticing until the program performs the operations depending on them. This may or may not be an acceptable risk.

Assuming that this is about the program described in this question, a more robust alternative would be to put the cyrillic characters in an external file instead of directly into the source code, and parse that file using UTF-8 explicitly.

Michael Borgwardt 2010-06-16 08:52:40

Thanks so much! That's exactly what it is about. Could you elaborate a little more on parsing using UTF-8. Are there any key methods I should use?

Emanuil 2010-06-16 08:57:20

@Emanuil: simply use an InputStreamReader and specify the encoding when you read the file. Or use a file format like XML where the encoding is specified by the file itself (requires the proper header and using a proper XML parser that operates on the file directly).

Michael Borgwardt 2010-06-16 09:14:09

Thanks again! You've been very helpful.

Emanuil 2010-06-16 09:25:26

Answer 2

+1 A:

If it is just a few characters, you can use the \uxxxx notation:

    char[][] translate = { 
        {'\u0430', 'a'},
        {'\u0431', 'b'},
        {'\u0432', 'v'},
        {'\u0433', 'g'},
        ...
    };

also have a look at the native2ascii tool that comes with the JDK to convert native text to unicode latin-1.

_{Please note: English is not my first nor my second language, any help would be appreciated}

Carlos Heuberger 2010-06-16 09:29:42

ansaurus

tags:

views:

answers:

Problems related to changing the encoding of a Java file

related questions