According to this documentation ( http://java.sun.com/docs/books/jls/third_edition/html/lexical.html , 3.10.6) an OctalEscape will be converted to an unicode character. Now I have the problem, that the following code will result in a 2 byte Unicode character with wrong informations.
for (byte b : "\222".getBytes()) {
System.out.format("%02x ", b);
}
The result is "c2 92". I was expacting only "92", because this would be the converted value from 222 octal to hex (92). If I test this with a character, the byte informations are correct.
System.out.format("%02x ", (byte)'\222');
The result is "92" for one byte" My default encoding is "UTF-8" on Linux with Java/c 1.6.0_18.
The background of my question is, that I'm looking for a method to convert an octal escaped string from the input encoding Cp1252 to UTF-8. This fails because of the conversion of an octal escaped string to 2 bytes. Does somebody know why there is always an extra byte "c2" been added to the char array? A simple count shows, that there is only one character in the array.
System.out.println("\222".toCharArray().length); // will result in "1"
Thank you for your hints.
Update: As BalusC mentioned the octal escaped value is interpreted as UTF-8 value, which yield the problem. As long as this value is saved in the source code (UTF-8) I have no possibility to read in this string with an other encoding. I'm right? If I read an Cp1252 encoded file, I have to declare the charset of the InputReader with the correct charset and do an encoding to UTF-8 to process and save the read content as UTF-8.