views:

43

answers:

1

String class has a constructor:

 new String(byte[] bytes, Charset charset)

and a method:

 byte[] getBytes(Charset charset)

Given that I define my charset as follows:

 Charset charset = Charset.forName("UTF-8");

What kind of encoding I will in fact use? More specifically is it a standard UTF-8 (as described in RFC 3629), or CESU-8, or Modified UTF-8? (See also corresponding Wikipedia article)

In case if it's not a standard UTF-8 is there a library that allows String operations in utf8?

A converter for these UTF-8-derived encodings is more than welcomed!

+3  A: 

The UTF-8 charset is specified by RFC 2279; the transformation format upon which it is based is specified in Amendment 2 of ISO 10646-1 and is also described in the Unicode Standard.

http://download-llnw.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html

Gunslinger47
For the record RFC 3629 is more or less a "corrected" version of RFC 2279.
Stephen C