views:

8444

answers:

5

Hi all,

I must convert a char into a byte or a byte array. In other languages I know that a char is just a single byte. However, looking at the Java Character class, its min value is \u0000 and its max value is \uFFFF. This makes it seem like a char is 2 bytes long.

Will I be able to store it as a byte or do I need to store it as two bytes?

Before anyone asks, I will say that I'm trying to do this because I'm working under an interface that expects my results to be a byte array. So I have to convert my char to one.

Please let me know and help me understand this.

Thanks, jbu

+16  A: 

To convert characters to bytes, you need to specify a character encoding. Some character encodings use one byte per character, while others use two or more bytes. In fact, for many languages, there are far too many characters to encode with a single byte.

In Java, the simplest way to convert from characters to bytes is with the String class's getBytes(String encoding) method. However, this method will silently replace characters with � if the character cannot be mapped under the specified encoding. If you need more control, you can configure a CharsetEncoder to handle this case with an error or use a different replacement character.

erickson
would using UTF-8 and storing my chars as a single byte be ok? I am thinking yes, even if that last bit was a sign bit for a byte.
jbu
You should use the character encoding required by the interface under which you are working.
erickson
For single byte encodings use the ISO-8859 family
Shimi Bandiel
well we are using utf-8, so I'm wondering if it's ok to just do the char->byte conversion
jbu
No, if you are using UTF-8, and have any non-ASCII characters (`char` values > 127), you should use an encoding API to convert to bytes. The non-ASCII characters require two or more bytes in UTF-8. If you simply cast chars in the range 128-255 to bytes, the wrong characters will be decoded.
erickson
Use "this string".getBytes("utf-8");
Seun Osewa
A: 

char in java is an unsigned 16 bit value. If what you have will fit in 7 bits then just do the cast to a byte (for instance ASCII will fit).

You could checkout the java.nio.charset APIs as well.

TofuBeer
It has to fit in 7 bits to work safely.
erickson
yes, I didn;t want to get into extended ASCII... but I'll update my answer.
TofuBeer
A: 

To extend what others are saying, if you have a char that you need as a byte array, then you first create a String containing that char and then get the byte array from the String:

private byte[] charToBytes(final char x) {
  String temp = new String(new char[] {x});
  try {
    return temp.getBytes("ISO-8859-1");
  } catch (UnsupportedEncodingException e) {
    // Log a complaint
    return null;
  }
}

Of course, use the appropriate character set. Much more efficient that this would be to start working with Strings rather than take a char at a time, convert to a String, then convert to a byte array.

Eddie
A: 

A char is indeed 16 bits in Java (and is also the only unsigned type!!).

If you are sure the encoding of your characters is ASCII, then you can just cast them away on a byte (since ASCII uses only the lower 7 bits of the char).

If you do not need to modify the characters, or understand their signification within a String, you can just store chars on two bytes, like:

char[] c = ...;
byte[] b = new byte[c.length*2];
for(int i=0; i<c.length; i++) {
    b[2*i] = (byte) (c[i]&0xFF00)>>8; 
    b[2*i+1] = (byte) (c[i]&0x00FF); 
}

(It may be advisable to replace the 2* by a right shift, if speed matters).

Note however that some actual (displayed) characters (or, more accurately, Unicode code-points) are written on two consecutive chars. So cutting between two chars does not ensure that you are cutting between actual characters.

If you need to decode/encode or otherwise manipulate your char array in a String-aware manner, you should rather try to decode and encode your char array or String using the java.io tools, that ensure proper character manipulation.

Varkhan
Rather than the code shown here, specify "UTF-16" as the character encoding and use built-in encoding APIs. Less code for you to implement, test, and maintain, and captures intent more clearly for readers of the code.
erickson
And also two orders of magnitude less in speed, because of the encoding/decoding, which may not be needed in this instance.
Varkhan
It's only encoding, and if it's any slower (which I doubt), it's not a factor of 100. Why do you think the UTF-16 encoding is doing anything significantly different than your code?
erickson
Because I have read Sun's code, and it's using the whole nio machinery, which is significantly slower. After benchmarking, the actual factor is between 30 and 50 (and yes, for my applications, that matters).
Varkhan
Oh, and I forgot to mention... decoders are not thread safe. But that's an other story...
Varkhan
A: 

Hello,

Can you please help me in the following.

I would like to trasffer the byte array of JAVA to the socket based application ?

how would i achieve this ?

Regards, Anil Kamani

Anil Kamani
You have a **question**. Press the `Ask Question` button at the right top to ask a question. Do not post your question as an **answer**. Feel free to include links to topics in your question which you found but which didn't help (and elaborate why not).
BalusC