views:

209

answers:

2

I need to get a string from URL request of brower, and then create a text image by requested text. I know the default encoding of the Java net transmission is "ISO-8859-1", it can works normally with all characters what defined in "ISO-8859-1". But when I request a multi-byte Unicode character (e.g. chinese or something like ¤ж), then I need to decode it by "UTF-8" from "ISO-8859-1".

My codes like:

String reslut = new String(requestString.getBytes("ISO-8859-1"), "UTF-8");

Everything is fine, but I found some characters in ISO-8859-1 are not been shown now, which characters are 0x80 - 0xFF(defined in" ISO-8859-1"), i.e. the characters except 0x00-0x7F are not been shown when converted to "UTF-8" from "ISO-8859-1"

Any other method can solve this query?

+1  A: 

What you are trying to do doesn't really make sense. Most ISO-8859-1 strings cannot be interpreted as UTF-8 strings.

Additionally, Chinese characters are not encodable in ISO-8859-1 (ISO-8859-1 is designed for Western European languages).

Simon Nickerson
Everything is fine, include Chinese and other multi-byte Unicode character, except 0x80-0xFF(in "ISO-8859-1").Your addition is right, but the multi-byte Unicode character are transfered by "ISO-8859-1", you can convert it by "UTF-8" as well, because browser is using "UTF-8" to encode the multi-byte Unicode .
Mike.Huang
+3  A: 

I know the default encoding of the Java net transmission is "ISO-8859-1"

I am not sure what you mean here, but this is not true in networking. All goes in bytes over the line. Maybe you're confusing it with the default encoding of the InputStreamReader with which you attempt to read the byte stream as characters. When constructing an InputStreamReader for a byte stream, you should use the constructor which takes the encoding as 2nd argument. E.g.

Reader reader = new InputStreamReader(connection.getInputStream(), "UTF-8");

If you're actually using java.net.URLConnection, then you should first filter the encoding from the Content-Type header and apply it as encoding.

BalusC
the question is after encoded with "utf-8", the characters 0x80-0xFF(in ISO-8859-1) are not been shown.
Mike.Huang
Then likely either the original stream was not in UTF-8, or you're **writing** the characters using a wrong encoding (to a file, or to the console, or to a webpage, etc where you're viewing them). You need to ensure that it is also using the same and correct encoding. Try posting a sample string and both the expected and unexpected output.
BalusC