ansaurus

Question

Byte Array to String

Answer 1

A:

Bencoded strings are byte strings. You can attempt to decode a byte string to unicode codepoints in Java with String(byte[] bytes, Charset charset). Decoding with certain encodings such as ISO-8859-1 will always succeed, since any byte maps directly to a codepoint. With many of these encodings (including ISO-8859-1) the process is also reversible.

Tuure Laurinolli 2009-11-02 22:31:40

Yes that's what i'm doing right now but bencoded strings contain binary data not just text, at least in torrents. Building a regular string will corrupt sha1's.

Hamza Yerlikaya 2009-11-02 22:35:34

Er, it shouldn't... as long as the codepoints cover the entire 0-255 byte range, nothing should change in the process.

Amber 2009-11-02 22:37:15

It's a common mistake to believe that ISO-8859-1 does a 1:1-mapping for bytes in the range 0-255. ISO-8859-1 is undefined in the range 128-159, so trying to convert a byte in that range to a character will result in '?' as a best-fit representation of an unknown character.

jarnbjo 2009-11-02 22:51:24

@jarnbjo, ISO 8859-1 is the encoding that doesn't define some code points, ISO-8859-1 does. The wikipedia article has more details.

Tuure Laurinolli 2009-11-02 23:46:24

@Hamza Yerlikaya, no it won't. If you encode the String again with ISO-8859-1, the resulting bytes are the the same. Or in code: Arrays.equals(bytes, new String(bytes, Charset.forName("ISO-8859-1")).getBytes("ISO-8859-1")) == true for any byte[] bytes.

Tuure Laurinolli 2009-11-02 23:50:13

Answer 2

A:

If Wikipedia is accurate on Bencode, the format seems straightforward enough. Parse the byte data directly:

while (true) {
  in.mark(1);
  int n = in.read();
  if (n < 0) {
    // end of input
    break;
  }
  in.reset();
  // take advantage of some UTF-16 values == ASCII values
  if (n == 'd') {
    // parse dictionary
  } else if (n == 'i') {
    // parse int
  } else if (n >= '0' && n <= '9') {
    // parse binary string
  } else if (n == 'l') {
    // parse list
  } else {
    throw new IOException("Invalid input");
  }

Store the binary strings in a type that only converts them to ASCII when you do it explicitly, as in this toString call:

public class ByteString {
  private final byte[] data;

  public ByteString(byte[] data) { this.data = data.clone(); }
  public byte[] getData() { return data.clone(); }

  @Override public String toString() {
    return new String(data, Charset.forName("US-ASCII"));
  }
}

McDowell 2009-11-02 23:11:21

ansaurus

tags:

views:

answers:

Byte Array to String

related questions