views:

9946

answers:

4

We try to convert from string to Byte[] using the following Java code:

String source = "0123456789"; byte[] byteArray = source.getBytes("UTF-16");

We get a byte array of length 22 bytes, we are not sure where this padding comes from? how do i get an array of length 20?

+10  A: 

May be the first two bytes are the Byte Order Mark (http://en.wikipedia.org/wiki/Byte_Order_Mark). It specifies the order of bytes in each 16-bit word used in the encoding.

Alexander
+3  A: 

Try printing out the bytes in hex to see where the extra 2 bytes are added - are they at the start or end?

I'm picking that you'll find a byte order marker at the start (0xFEFF) - this allows anyone consuming (receiving) the byte array to recognise whether the encoding is little-endian or big-endian.

Bevan
+15  A: 

Alexander's answer explains why it's there, but not how to get rid of it. You simply need to specify the endianness you want in the encoding name:

String source = "0123456789";
byte[] byteArray = source.getBytes("UTF-16LE"); // Or UTF-16BE
Jon Skeet
+1  A: 

UTF has a byte order marker at the beginning that tells that this stream is encoded in a particular format. As the other users have pointed out, the
1st byte is 0XFE
2nd byte is 0XFF
the remaining bytes are
0
48
0
49
0
50
0
51
0
52
0
53
0
54
0
55
0
56
0
57

anjanb