I need to encode/decode UTF-16 byte arrays to and from java.lang.String
. The byte arrays are given to me with a Byte Order Marker (BOM), and I need to encoded byte arrays with a BOM.
Also, because I'm dealing with a Microsoft client/server, I'd like to emit the encoding in little endian (along with the LE BOM) to avoid any misunderstandings. I do realize that with the BOM it should work big endian, but I don't want to swim upstream in the Windows world.
As an example, here is a method which encodes a java.lang.String
as UTF-16
in little endian with a BOM:
public static byte[] encodeString(String message) {
byte[] tmp = null;
try {
tmp = message.getBytes("UTF-16LE");
} catch(UnsupportedEncodingException e) {
// should not possible
AssertionError ae =
new AssertionError("Could not encode UTF-16LE");
ae.initCause(e);
throw ae;
}
// use brute force method to add BOM
byte[] utf16lemessage = new byte[2 + tmp.length];
utf16lemessage[0] = (byte)0xFF;
utf16lemessage[1] = (byte)0xFE;
System.arraycopy(tmp, 0,
utf16lemessage, 2,
tmp.length);
return utf16lemessage;
}
What is the best way to do this in Java? Ideally I'd like to avoid copying the entire byte array into a new byte array that has two extra bytes allocated at the beginning.
The same goes for decoding such a string, but that's much more straightforward by using the java.lang.String
constructor:
public String(byte[] bytes,
int offset,
int length,
String charsetName)