views:

232

answers:

2

I have a program that handles byte arrays in Java, and now I would like to write this into a XML file. However, I am unsure as to how I can convert the following byte array into a sensible String to write to a file. Assuming that it was Unicode characters I attempted the following code:

String temp = new String(encodedBytes, "UTF-8");

Only to have the debugger show that the encodedBytes contain "\ufffd\ufffd ^\ufffd\ufffd-m\ufffd\ufffd\/ufffd \ufffd\ufffdIA\ufffd\ufffd". The String should contain a hash in alphanumerical format.

How would I turn the above String into a sensible String for output?

+4  A: 

If your string is the output of a password hashing scheme (which it looks like it might be) then I think you will need to Base64 encode in order to put it into plain text.

Standard procedure, if you have raw bytes you want to output to a text file, is to use Base 64 encoding. The Commons Codec library provides a Base64 encoder / decoder for you to use.

Hope this helps.

Phill Sacre
That's a *very* good answer.
Donal Fellows
Recommend the asker create an attribute for that element to indicate the encoding (with a default value in the DTD or schema so you don't necessarily have to specify it in the doc).
Bert F
Yep, it's a hash. I'll have a look at the Commons Codec stuff soon. I assume you just download the jars and implement them into your project?
EnderMB
@Ender - yes that's right. There should be a user guide on the site to get you started.
Phill Sacre
+2  A: 

The byte array doesn't look like UTF-8. Note that \ufffd (named REPLACEMENT CHARACTER) is "used to replace an incoming character whose value is unknown or unrepresentable in Unicode."

Addendum: Here's a simple example of how this can happen. When cast to a byte, the code point for ñ is neither UTF-8 nor US-ASCII; but it is valid ISO-8859-1. In effect, you have to know what the bytes represent before you can encode them into a String.

public class Hello {

    public static void main(String[] args)
            throws java.io.UnsupportedEncodingException {
        String s = "Hola, señor!";
        System.out.println(s);
        byte[] b = new byte[s.length()];
        for (int i = 0; i < b.length; i++) {
            int cp = s.codePointAt(i);
            b[i] = (byte) cp;
            System.out.print((byte) cp + " ");
        }
        System.out.println();
        System.out.println(new String(b, "UTF-8"));
        System.out.println(new String(b, "US-ASCII"));
        System.out.println(new String(b, "ISO-8859-1"));
    }
}

Output:

Hola, señor!
72 111 108 97 44 32 115 101 -15 111 114 33 
Hola, se�or!
Hola, se�or!
Hola, señor!
trashgod