views:

383

answers:

3

I need to perform a conversion of characters from UTF-8 to ISO-8859-1 in Java without losing for example all of the UTF-8 specific punctuation.
Ideally would like these to be converted to equivalents in ISO (e.g. there are probably 5 different single quotes in UTF-8 and would like them all converted to ISO single quote character).

String.getBytes("ISO-8859-1") just won't do the trick in this case as it will lose the UTF-8-specific chars.

Do you know of any ready mappings or libraries in Java that would map UTF-8 specific characters to ISO?

A: 

Have you considered using an OutputStream with an explicit character set of ISO-8851-1?

Then just write your Unicode chars and see what you get.

Thorbjørn Ravn Andersen
It will do the same thing as String.getBytes("ISO-8859-1")What I need is some kind of UTF to ISO (or even ASCII) normalization tool.
Paweł Krupiński
+2  A: 

IBM's ICU project might be what you're looking for. It has support for fallback conversions.

beny23
Good idea, but their database does not contain too many fallbacks.Ended up constructing my own one based on the UTF-8 data my system's processing. Surprisingly there weren't that many mappings necessary.
Paweł Krupiński
A: 

The Java Development Kit has a tool called native2ascii that will do this. Use:

native2ascii -encoding UTF-8 [ inputfile [ outputfile ] ]

You can also go back the other way using the -reverse option.

Also see the list of supported encodings for JDK 1.6.

richj