views:

407

answers:

5

I am about to start working on something the requires reading bytes and creating strings. The bytes being read represent UTF-16 strings. So just to test things out I wanted to convert a simple byte array in UTF-16 encoding to a string. The first 2 bytes in the array must represent the endianness and so must be either 0xff 0xfe or 0xfe 0xff. So I tried creating my byte array as follows:

byte[] bytes = new byte[] {0xff, 0xfe, 0x52, 0x00, 0x6F, 0x00};

But I got an error because 0xFF and 0xFE are too big to fit into a byte (because bytes are signed in Java). More precisely the error was that the int couldn't be converted to a byte. I know that I could just explicitly convert from int to byte with a cast and achieve the desired result, but that is not what my question is about.

Just to try something out I created a String and called getBytes("UTF-16") then printed each of the bytes in the array. The output was slightly confusing because the first two bytes were 0xFFFFFFFE 0xFFFFFFFF, followed by 0x00 0x52 0x00 0x6F. (Obvisouly the endianness here is different from what I was trying to create above but that is not important).

Using this output I decided to try and create my byte array the same way:

byte[] bytes = new byte[] {0xffffffff, 0xfffffffe, 0x52, 0x00, 0x6F, 0x00};

And strangely enough it worked fine. So my question is, why does Java allow an integer value of 0xFFFFFF80 or greater to be automatically converted to a byte without an explicit cast, but anything equal to or greater than 0x80 requires an explicit cast?

+2  A: 

If you use a number without a hint (e.g. 1234L for a long) the compiler assumes an integer. The value 0xffffffff is an integer with value -1 which can be cast to byte without a warning.

tangens
...because Java uses two's complement notation for negative values.
Ash
A: 

Because 0xffffffff is the number -1 and -1 can be interpreted as a byte.

Confusion
A: 

0xff is the same as writing 0x000000ff, not 0xffffffff. So that's your issue; the integer is a positive number (255), but the byte (if converted bit-for-bit) would be a negative number (-1). But 0xffffffff is -1 both as an int and as a byte.

T.J. Crowder
A: 

Because int are signed and 0xffffffff represent -1, and 0xff represent an integer of value 255, which not lie into -128 (0x80) +127 (0x7f) range of a byte.

Patrick
+6  A: 

The key thing to remember here is that int in Java is a signed value. When you assign 0xffffffff (which is 2^32 -1), this is translated into a signed int of value -1 - an int cannot actually represent something as large as 0xffffffff as a positive number.

So for values less than 0x80 and greater than 0xFFFFFF80, the resulting int value is between -128 and 127, which can unambiguously be represented as a byte. Anything outside that range cannot be, and needs forcing with an explicit cast, losing data in the process.

skaffman
Thanks, that makes it much clearer.
DaveJohnston