tags:

views:

2691

answers:

3

is there any free java library which i can use to convert string in one encoding to other encoding, something like icnov in php? i'm using java version 1.3

+3  A: 

CharsetDecoder should be what you are looking for, no ?

Many network protocols and files store their characters with a byte-oriented character set such as ISO-8859-1 (ISO-Latin-1).
However, Java's native character encoding is Unicode UTF16BE (Sixteen-bit UCS Transformation Format, big-endian byte order).

See Charset. That doesn't mean UTF16 is the default charset (i.e.: the default "mapping between sequences of sixteen-bit Unicode code units and sequences of bytes"):

Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets.
[US-ASCII, ISO-8859-1 a.k.a. ISO-LATIN-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16]
The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.

This example demonstrates how to convert ISO-8859-1 encoded bytes in a ByteBuffer to a string in a CharBuffer and visa versa.

// Create the encoder and decoder for ISO-8859-1
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = charset.newDecoder();
CharsetEncoder encoder = charset.newEncoder();

try {
    // Convert a string to ISO-LATIN-1 bytes in a ByteBuffer
    // The new ByteBuffer is ready to be read.
    ByteBuffer bbuf = encoder.encode(CharBuffer.wrap("a string"));

    // Convert ISO-LATIN-1 bytes in a ByteBuffer to a character ByteBuffer and then to a string.
    // The new ByteBuffer is ready to be read.
    CharBuffer cbuf = decoder.decode(bbuf);
    String s = cbuf.toString();
} catch (CharacterCodingException e) {
}
VonC
From http://www.exampledepot.com/egs/java.nio.charset/ConvertChar.html
VonC
Unicode is not an encoding! UTF-8, UTF-16 etc. are. See http://www.joelonsoftware.com/articles/Unicode.html
SealedSun
@SealedSun: very true. I have fixed that "java native encoding" section in my answer.
VonC
+3  A: 

You don't need a library beyond the standard one - just use Charset. (You can just use the String constructors and getBytes methods, but personally I don't like just working with the names of character encodings. Too much room for typos.)

EDIT: As pointed out in comments, you can still use Charset instances but have the ease of use of the String methods: new String(bytes, charset) and String.getBytes(charset).

Jon Skeet
I prefer new String(byte[], encoding) and String.getBytes(encoding) in most cases, because they are simple one-liners as opposed to the more powerful but more complicated API of Charset (which, BTW, is only available in Java 1.4+).
Alexander
Yes, it's a shame that the Charset API is so complicated. The .NET System.Encoding class does this really well, IMO - and keeps the functionality out of String.
Jon Skeet
Links fixed. See http://www.free-scripts.net/html_tutorial/html/topics/urlencoding.htm
VonC
@VonC: Thanks. Shame the UI doesn't help do this automatically :)
Jon Skeet
And now, a brand new sofaq section ;) http://stackoverflow.com/questions/229364
VonC
A: 

It is a whole lot easier if you think of unicode as a character set (which it actually is - it is very basically the numbered set of all known characters). You can encode it as UTF-8 (1-3 bytes per character depending) or maybe UTF-16 (2 bytes per character or 4 bytes using surrogate pairs).

Back in the mist of time Java used to use UCS-2 to encode the unicode character set. This could only handle 2 bytes per character and is now obsolete. It was a fairly obvious hack to add surrogate pairs and move up to UTF-16.

A lot of people think they should have used UTF-8 in the first place. When Java was originally written unicode had far more than 65535 characters anyway...