views:

393

answers:

2

I want to encode a UTF-8 string to a ISO 8859- string in Java

I have this:

String title = new String(item.getTitle().getText().getBytes("ISO-8859-1"));

But it isn't working, the output is Sørensen for example

+4  A: 

This problem isn't to be solved that way. Strings in Java are always in the same encoding (UTF-16), you've basically only changed the content. You need to set the encoding in the destination of this string. If it's the stdout, you need to set its encoding. If it's a file, you need to set its Writer encoding. If it's a HTML page, you need to set the response encoding. If it's a database, you need to set the DB/table/connection encoding. Etcetera.

Update: as per the comments:

The string is from a RSS feed that is in UTF-8, and I want to show in in a HTML page that uses ISO 8859 encoding

You'll need to upgrade the HTML page's encoding from vintage ISO 8859 encoding to the modern and world-domination-prepared UTF-8 encoding.

Update 2: as per the comments:

Firefox shows the it in the right encoding by default (utf-8) but Internet Explorer for example doesn't

Then the text is actually fine. You don't need to massage the string into another encoding. The symptoms tells that the character encoding information is missing in the response headers. Firefox has actually a pretty smart encoding detector, while IE will use the platform default encoding when the encoding is unknown. But IE will also fail if the HTML is (drastically) malformed in doctype and head.

Thus, either the HTML response is syntactically invalid, or the response content type wasn't set correctly. Assuming that your website validates and that you're using JSP/Servlet (after judging your post history here), you basically need to add the following line to the top of your JSP:

<%@ page pageEncoding="UTF-8" %>

That's all. It will automatically set both the response encoding (so that the server knows which encoding to use to write the characters to the byte stream of the response) and the encoding in the Content-Type response header (so that the client knows which encoding to use to read/display those characters from the byte stream of the response). For more background information you may find this article useful.

BalusC
+7  A: 

There's no such thing as a "UTF-8 string" in Java... there are just strings, which are always in Unicode. (They're effectively always UTF-16.)

You can have a byte array which is an ISO-8859-1 encoded form of a string (or UTF-8 or whatever) but it doesn't make sense to have a string with an encoding.

If you've read a string with the incorrect encoding somewhere, the correct thing to do is fix the code which reads the string, rather than trying to decode/encode the data from the string form later.

If you could give more information about the problem, we can probably give some more useful advice.

Jon Skeet
The string is from a RSS feed that is in UTF-8, and I want to show in in a HTML page that uses ISO 8859 encoding
Derk
@Derk: Then all you need to do is make sure that you *read* the RSS feed as UTF-8. That will get the correct data into the string. Assuming you're using a framework which knows about encodings for the HTML, you should just be able to write out the data... although obviously there are lots of characters which simply aren't covered in ISO-8859-1. (Do you have any particular reason not to write out the HTML in UTF-8 as well? That would be a better plan in general, as then you can cover all Unicode characters.)
Jon Skeet
I'm using the rssutils.jar library, but I can't find a way to set the source encoding.
Derk
@Derk: I'd expect it to automatically detect it, to be honest. How are you giving it the RSS?
Jon Skeet
The problem could be the RSS feed. The data is in UTF-8, but this: <?xml version="1.0" encoding="ISO-8859-1" ?> is in the feed. Firefox shows the it in the right encoding by default (utf-8) but Internet Explorer for example doesn't
Derk
@Derk: given that specification, it's actually an **error** interpret the content as UTF-8. In that case two wrongs might make a right, but those producing the broken XML should still be punished.
Joachim Sauer
@Derk: Joachim is absolutely right. The file is basically lying to you. Contact the provider and see if you can get it fixed.
Jon Skeet