views:

28

answers:

1

The Javadoc for this says:

Only the lower two bytes of the integer oneChar are written.

What effect, if any, does this have on writing non-utf8 encoded chars which have been cast to an int?

Update:

The code in question receives data from a socket and writes it to a file. (A lot of things happen between receiving and writing, so I can't just use the string I get using BufferedReader#readLine()). I was using Writer#write(char[]) but this meant I had to create a new char array each time. To get around creating an array everytime, I had a single char array which is filled with -1 (cast to a char).

I then use TextUtils#getChars to fill it, expanding the array if necessary. For writing, I loop through the array, writing to the Writer until char[i] == (char) -1 == true.

A: 

Internally, write(int) will just cast its parameter to char, so write(i) is equivalent to write((char)i).

Now in Java, internally char is just an integer type, with the range 0-65535 (i.e. 16 bit). The cast int -> char is a "narrowing primitive conversion" (Java Language spec, 5.1.3), and int is a signed integer, hence:

A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.

That's why the Javadoc says that only the lower two bytes are written.

Now, what this means in terms of characters depends on how you want to interpret the int values. A char in Java represents a Unicode code point in UTF-16, that is the 16 bit number represented by the char is interpreted as the number of the Unicode code point. So if each of your int values is the number of a 16 bit code point, you're fine (actually, this is only true for characters in the BMP; if you use characters in the supplementary planes, each Unicode code point will be encoded into two chars). If it's anything else (including a code point with more than 16 bit, or a negative number, or something else entirely), you'll get garbage.

What effect, if any, does this have on writing non-utf8 chars which have been cast to an int?

There is no such thing as a "non-utf8 char". UTF-8 is an encoding, that is a way to represent a Unicode code point, so the question as posed is meaningless. Maybe you could explain what your code does?

sleske
updated my question
Al