views:

332

answers:

1

Hey Friends, I'm trying to implement a java "hash" function in ruby.

Here's the java side:

import java.nio.charset.Charset;
import java.security.MessageDigest;

/**
 * @return most significant 8 bytes of the MD5 hash of the string, as a long
 */
protected long hash(String value) {
  byte[] md5hash;
  md5hash = md5Digest.digest(value.getBytes(Charset.forName("UTF8")));
  long hash = 0L;
  for (int i = 0; i < 8; i++) {
    hash = hash << 8 | md5hash[i] & 0x00000000000000FFL;
  }
  return hash;
}

So far, my best guess in ruby is:

# WRONG - doesn't work properly.
#!/usr/bin/env ruby -wKU

require 'digest/md5'
require 'pp'

md5hash = Digest::MD5.hexdigest("0").unpack("U*")
pp md5hash
hash = 0
0.upto(7) do |i|
  hash = hash << 8 | md5hash[i] & 0x00000000000000FF
end
pp hash

Problem is, this ruby code doesn't match the java output.

For reference, the above java code given these strings returns the corresponding long:

"00038c53790ecedfeb2f83102e9115a522475d73" => -2059313900129568948
"0" => -3473083983811222033
"001211e8befc8ac22dd265ecaa77f8c227d0007f" => 3234260774580957018

Thoughts:

  • I'm having problems getting the UTF8 bytes from the ruby string
  • In ruby I'm using hexdigest, I suspect I should be using just digest instead
  • The java code is taking the md5 of the UTF8 bytes whereas my ruby code is taking the bytes of the md5 (as hex)

Any suggestions on how to get the exact same output in ruby?

+1  A: 
require 'digest/md5'

class String
  def my_hash
    hi1, hi2, mid, lo = *Digest::MD5.digest(self).unpack('cCnN')
    hi1 << 56 | hi2 << 48 | mid << 32 | lo
  end
end

require 'test/unit'
class TestMyHash < Test::Unit::TestCase
  def test_that_my_hash_hashes_the_string_0_to_negative_3473083983811222033
    assert_equal -3473083983811222033, '0'.my_hash
  end
end
Jörg W Mittag
No, the testcase is correct. A Java `long` type is a 64-bit _signed_ integer type, hence a value with the most significant bit set will yield a negative number. Here, 0xcfcd208495d565ef has value 14973660089898329583 - 2^64 = -3473083983811222033, as stated.
Thomas Pornin
@Thomas Pornin: Yes, you are right. I guess I'm just not used to languages with broken arithmetics.
Jörg W Mittag