views:

628

answers:

2

I'm trying to represent the result of an MD5 hash in the shortest possible string. It seems a waste to just turn it into a hex string and let G through Z go to waste.

One idea I have had is getting the MD5 hash of my input as an array of bytes and constructing a BigInt with it. I can then call toString(36), and get the number as a base-36 in a string (-?[0-9a-z]*, the number can be positive or negative). It works for me.

Problem is, I'm not sure that a BigInt can be constructed with any array of bytes, and I can't prove it with testing (at least not in a timely way!). I assume so, because I understand that a BigInt can be of arbitrary size. I can't use this method until I know for sure that it will work for all possible outputs. So, can anyone tell me whether it will work for all inputs (or how to easily convert a byte array so it can be represented in base 36).

Clarification: I have the implementation, I'm asking about the behaviour over the whole domain (i.e. 00000000000000000000000000000000 to FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF)

A: 

Wouldn't Base64 encoding be shorter than Base36? You can find plenty of implementations around.

But, to actually answer the question:

  // Make a big randomly-filled byte array
  val random = scala.util.Random
  val arraySize = 8543
  val bytes: Array[Byte] = new Array[Byte](arraySize) // make some big array
  random.nextBytes(bytes) // fill it randomly

  // Make a BigInt out of it and the corresponding base36 string representation
  val bi: BigInt = new BigInt(new java.math.BigInteger(bytes))
  val strRep: String = bi.toString(36)

  // Make a new BigInt out of the string rep.  Does it match?
  val bi2: BigInt = new BigInt(new java.math.BigInteger(strRep, 36))
  if (bi == bi2) {
      println("yippee!!")
  }

  // Make a new byte array out of the BigInt.  Does it match the original array?
  val bytes2: Array[Byte] = bi2.toByteArray
  if (bytes deepEquals bytes2) {
      println("yippee again!!")
  }
Mitch Blevins
I'm putting this into a file name in a URL which Windows users may download. Having mixed caps in Windows is more trouble than it's worth.Thanks for your answer, but I was asking about being able to construct a BigInt with any byte array. I already have the implementation, I'm asking about it's behaviour over the whole set of byte arrays (i.e. from 00000000000000000000000000000000 to FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF)
Joe
Since you have the implementation, testing a loop or random values and your corner cases should be straightforward. You'll find that if the leading byte is 0x00 and the second byte is not negative, or if the leading byte is 0xff and the second byte is negative, this will fail.
Mitch Blevins
+1  A: 

Building on your feedback above, the following implementation will reliably encode/decode an arbitrary byte array:

package blevins.example

object BigIntEncoder {
  val radix = 36

  implicit def byteArrayToString(ba: Array[Byte]): String = {
    new java.math.BigInteger(addByte(ba)).toString(radix)
  }

  implicit def stringToByteArray(s: String): Array[Byte] = {
    stripByte(new java.math.BigInteger(s, radix).toByteArray)
  }

  def addByte(ba: Array[Byte]): Array[Byte] = {
    val h = new Array[Byte](1)
    h(0) = 0x01
    h ++ ba
  }

  def stripByte(ba: Array[Byte]): Array[Byte] = {
    ba.slice(1,ba.size)
  }

}

Note that we are adding an extra 0x01 byte to the head of the array to avoid any side effects from taking the two-complement of the byte array.

EDIT: The testing involved to prove this out is documented here: http://cleverlytitled.blogspot.com/2009/10/scalacheck.html

Mitch Blevins
Sorry for late accept. Spare time projects are sadly only for spare time!
Joe