tags:

views:

514

answers:

5

What's the most efficient way to convert an md5 hash to a unique integer to perform a modulus operation?

+1  A: 

There is more data in a MD5 than will fit in even a 64b integer, so there's no way (without knowing what platform you are using) to get a unique integer. You can get a somewhat unique one by converting the hex version to several integers worth of data then combining them (addition or multiplication). How exactly you would go about that depends on what language you are using though.

Alot of language's will implement either an unpack or sscanf function, which are good places to start looking.

Matthew Scharley
You're assuming that "integer" is limited to 64 bits. What about BigInteger?
Jon Skeet
Of course, but alot of platforms don't have arbitrary length integers. Depends entirely on what platform you are on as to if there is a (native) BigInteger implementation.
Matthew Scharley
Absolutely, but "it depends on your platform" is not the same as "there's no way" :)
Jon Skeet
+2  A: 

You haven't said what platform you're running on, or what the format of this hash is. Presumably it's hex, so you've got 16 bytes of information.

In order to convert that to a unique integer, you basically need a 16-byte (128-bit) integer type. Many platforms don't have such a type available natively, but you could use two long values in C# or Java, or a BigInteger in Java or .NET 4.0.

Conceptually you need to parse the hex string to bytes, and then convert the bytes into an integer (or two). The most efficient way of doing that will entirely depend on which platform you're using.

Jon Skeet
Thank you. How could this be done in php?
ensnare
You'd be looking at sscanf in PHP: http://au.php.net/manual/en/function.sscanf.php
Matthew Scharley
Except that sscanf doesn't seem to like the `%x` format, unless it's just me...
Matthew Scharley
A: 

You'll need to define your own hash function that converts an MD5 string into an integer of the desired width. If you want to interpret the MD5 hash as a plain string, you can try the FNV algorithm. It's pretty quick and fairly evenly distributed.

Chris
+2  A: 

Since the solution language was not specified, Python is used for this example.

import os
import hashlib

array = os.urandom(1 << 20)
md5 = hashlib.md5()
md5.update(array)
digest = md5.hexdigest()
number = int(digest, 16)

print(number % YOUR_NUMBER)
Noctis Skytower
+1  A: 

If all you need is modulus, you don't actually need to convert it to 128-byte integer. You can go digit by digit or byte by byte, like this.

mod=0
for(i=0;i<32;i++)
{
   digit=md5[i]; //I presume you can convert chart to digit yourself.
   mod=(mod*16+digit) % divider;
}
yu_sha