views:

52

answers:

2

Hi,

A bloom filter uses a hash function (or many) to generate a value between 0 and m given an input string X. My question is how to you use a hash function to generate a value in this way, for example an MD5 hash is typically represented by a 32 length hex string, how would I use an MD5 hashing algorithm to generate a value between 0 and m where I can specify m? I'm using Java at the moment so an example of to do this with the MessageDigest functionality it offers would be great, though just a generic description of how to do about it would be fine too.

Thanks

A: 

Simplest way would probably be to just convert the hash output (as a byte sequence) to a single binary number and take that modulo m.

Amber
Hi Dav, cheers for the reply could you flesh out an example of "just convert the hash output (as a byte sequence) to a single binary number" thanks :D
dangerstat
+1  A: 

You should first convert the hash output to an unsigned integer, then reduce it modulo m. This looks like this:

MessageDigest md = MessageDigest.getInstance("MD5");
// hash data...
byte[] hashValue = md.digest();
BigInteger n = new BigInteger(1, hashValue);
n = n.mod(m);
// at that point, n has a value between 0 and m-1 (inclusive)

I have assumed that m is a BigInteger instance. If necessary, use BigInteger.valueOf(). Similarly, use n.intValue() or n.longValue() to get the value of n as one of the primitive types of Java.

The modular reduction is somewhat biased, but the bias is very small if m is substantially smaller than 2^128.

Thomas Pornin
Thanks for your answer :)
dangerstat