tags:

views:

1815

answers:

7

I have to convert a byte array to string in Android, but my byte array contains negative values.

If I convert that string again to byte array, values I am getting are different from original byte array values.

What can I do to get proper conversion? Code I am using to do the conversion is as follows:

// Code to convert byte arr to str:
byte[] by_original = {0,1,-2,3,-4,-5,6};
String str1 = new String(by_original);
System.out.println("str1 >> "+str1);

// Code to convert str to byte arr:
byte[] by_new = str1.getBytes();
for(int i=0;i<by_new.length;i++) 
System.out.println("by1["+i+"] >> "+str1);

I am stuck in this problem.

A: 

A string is a collection of char's (16bit unsigned). So if you are going to convert negative numbers into a string, they'll be lost in translation.

Toad
-1: This is incorrect. While 'byte' is a signed type in Java, they are treated as unsigned by the library code that does character set encoding and decoding.
Stephen C
A fine example why having an unsigned 8 bit datatype really is a good idea to have in a language. Avoids unnecessary confusion ;^)
Toad
A: 

Try to specify an 8-bit charset in both conversions. ISO-8859-1 for instance.

Maurice Perry
A: 

Your byte array must have some encoding. The encoding cannot be ASCII if you've got negative values. Once you figure that out, you can convert a set of bytes to a String using:

byte[] bytes = {...}
String str = new String(bytes, "UTF8"); // for UTF8 encoding

There are a bunch of encodings you can use, look at the Charset class in the Sun javadocs.

omerkudat
it will not work with UTF8 though.
Maurice Perry
That was just a sample, I actually don't know what encoding he should use...
omerkudat
+1  A: 

Using new String(byOriginal) and converting back to byte[] using getBytes() doesn't guarantee two byte[] with equal values. This is due to a call to StringCoding.encode(..) which will encode the String to Charset.defaultCharset(). During this encoding, the encoder might choose to replace unknown characters and do other changes. Hence, using String.getBytes() might not return an equal array as you've originally passed to the constructor.

sfussenegger
-1: This is incorrect. The `String(byte[])` constructor does not change its input argument. It creates a new `char[]` from the supplied bytes and embeds that in the `String` object.
Stephen C
Well, I didn't meant that it changes the original array. But on second read, I think I failed to explain what I really meant. Gonna change that ...
sfussenegger
A: 

I just ran this test program and the original byte array appears to be preserved:

import java.io.IOException;

public class Test {
    public static void main(String[] args) throws IOException {

        // Code to convert byte arr to str: 
        byte[] by_original = {0,1,-2,3,-4,-5,6};
        String str1 = new String(by_original, "UTF-8");
        System.out.println("str1 >> " + str1);

        // Code to convert str to byte arr:
        byte[] by_new = str1.getBytes();
        for(int i=0; i<by_new.length; i++) 
            System.out.println("by1[" + i + "] >> " + by_new[i]);
    }
}

I had to change your last System.out.println() a little to output the by_new variable.

Here is the output:

$ javac Test.java 
$ java Test
str1 >> ???
by1[0] >> 0
by1[1] >> 1
by1[2] >> -2
by1[3] >> 3
by1[4] >> -4
by1[5] >> -5
by1[6] >> 6

I'm not sure what you expected the str1 string to hold but at least the values of the byte array are preserved, which was your main goal, right?

Asaph
Actually, the output will depend on the default charset of the platform.
Maurice Perry
@Maurice Perry: Ok, I added UTF-8 encoding to the String constructor to address your concern. Should return the same thing on all platforms now.
Asaph
+1  A: 

The root problem is (I think) that you are unwittingly using a character set for which:

 bytes != encode(decode(bytes))

in some cases. UTF-8 is an example of such a character set. Specifically, certain sequences of bytes are not valid encodings in UTF-8. If the UTF-8 decoder encounters one of these sequences, it is liable to discard the offending bytes or decode them as the Unicode codepoint for "no such character". Naturally, when you then try to encode the characters as bytes the result will be different.

The solution is:

  1. Be explicit about the character encoding you are using; i.e. use a String constructor and String.toByteArray method with an explicit charset.
  2. Use the right character set for your byte data ... or alternatively one (such as "Latin-1" where all byte sequences map to valid Unicode characters.
Stephen C
+5  A: 

The "proper conversion" between byte[] and String is to explicitly state the encoding you want to use. If you start with a byte[] and it does not in fact contain text data, there is no "proper conversion". Strings are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.

If you really must use a String to hold binary data then the safest way is to use Base64 encoding.

Michael Borgwardt