views:

10149

answers:

5

Does anyone know how to convert a string from ISO-8859-1 to UTF-8 and back in Java?

I'm getting a string from the web and saving it in the RMS (J2ME), but I want to preserve the special chars and get the string from the RMS but with the ISO-8859-1 encoding. How do I do this?

A: 

Charsets might do it.

TofuBeer
and the reason it won't do it is? If you are going to mark down please say why!
TofuBeer
I didn't downvote, but I assume it was because nio (and thus Charset) is not availble in J2ME.
Joachim Sauer
The source is freely available however...
TofuBeer
What does that mean? Are you saying people should copy the source code from the JDK into their own projects?
Alan Moore
harmony is under the apache license. If it were the only choice (which it isn't given the other answers) then that is likely an acceptable solution for many people.
TofuBeer
A: 
byte bytes[] = string.getBytes("ISO-8859-1");
String s = new String(bytes, "UTF-8");
Maurice Perry
+1  A: 

If you have a String, you can do that:

String s = "test";
try {
    s.getBytes("UTF-8");
} catch(UnsupportedEncodingException uee) {
    uee.printStackTrace();
}

If you have a 'broken' String, you did something wrong, converting a String to a String in another encoding is defenetely not the way to go! You can convert a String to a byte[] and vice-versa (given an encoding). In Java Strings are AFAIK encoded with UTF-16 but that's an implementation detail.

Say you have a InputStream, you can read in a byte[] and then convert that to a String using

byte[] bs = ...;
String s;
try {
    s = new String(bs, encoding);
} catch(UnsupportedEncodingException uee) {
    uee.printStackTrace();
}

or even better (thanks to erickson) use InputStreamReader like that:

InputStreamReader isr;
try {
     isr = new InputStreamReader(inputStream, encoding);
} catch(UnsupportedEncodingException uee) {
    uee.printStackTrace();
}
Johannes Weiß
If you have an InputStream, you should wrap it with an InputStreamReader.
erickson
Thanks! That's true an even available in J2ME :-)
Johannes Weiß
+8  A: 

In general, you can't do this. UTF-8 is capable of encoding any Unicode code point. ISO-8859-1 can handle only a tiny fraction of them. So, transcoding from ISO-8859-1 to UTF-8 is no problem. Going backwards from UTF-8 to ISO-8859-1 will cause "replacement characters" (�) to appear in your text when unsupported characters are found.

To transcode text:

byte[] utf8 = new String(latin1, "ISO-8859-1").getBytes("UTF-8");

or

byte[] latin1 = new String(utf8, "UTF-8").getBytes("ISO-8859-1");

You can exercise more control by using the lower-level Charset APIs. For example, you can raise an exception when an un-encodable character is found, or use a different character for replacement text.

erickson
+1  A: 

I could enconde an iso-8859-1 String that came from a DB to utf-8 with the following code example:

String xml = new String("áéíóúñ");
byte[] latin1 = xml.getBytes("UTF-8");
byte[] utf8 = new String(latin1, "ISO-8859-1").getBytes("UTF-8");
David García González