views:

191

answers:

3

I need to create unique numerical ids for some short strings.

some.domain.com    -> 32423421
another.domain.com -> 23332423
yet.another.com    -> 12131232

Is there a Perl CPAN module that will do something like this?

I've tried using Digest::MD5 but the resulting numbers are too long:

some.domain.com    -> 296800572457176150356613937260800159845
+17  A: 

Just take the first 8 digits of the MD5 hash. This works because MD5 is uniformly distributed over its hash address space. This means that any consecutive sequence of MD5 hash digits will itself be a uniformly distributed hash.

Alternatively, just use some other uniformly-distributed hashing mechanism that returns 8 numbers. Whatever's easiest for you.

John Feminella
but then the probability of a collision goes up?
git-noob
That's right, but your probability of a collision always goes up when you reduce the address space. You'd have precisely the same problem using a shorter hash no matter how it's created.
John Feminella
+4  A: 

Either Digest::CRC or String::CRC32. The first gives you option to calculate 8-, 16- and 32-bit chcecksums, while second only supports 32-bit.

vartec
+3  A: 

Given the fact that the strings look like a host names, perhaps you will just resolve them to ip, and present the ip as integer?

Kind of like:

perl -le 'my $ip = gethostbyname("depesz.com"); my $num = unpack("N", $ip); print $num'
1311657670
depesz
What if they all point to the same IP? There are IPs out there that serve some 10 million host names.
innaM