views:

560

answers:

2

I'm looking to create a 32-bit hash of some data objects. Since I don't feel like writing my own hash function and md5 is available, my current approach is to use the first 32 bits (i.e. first 8 hex digits) from an md5 hash. Is this acceptable?

In other words, are the first 32 bits of an md5 hash just as "random" as any other substring? Or is there any reason I'd prefer, say, the last 32 bits? or perhaps XOR'ing the four 32-bit substrings together?

Some preemptive clarifications:

  • These hashes don't need to be cryptographically secure.
  • I'm not concerned with the performance of md5--it is more than fast enough for my needs.
  • These hashes just need to be "random" enough that collisions are rare.
  • In this system, the number of items shouldn't exceed 10,000 (realistically it's probably not going to get half that high). So in the worst case the probability of encountering any collisions at all should be about 1% (assuming a sufficiently "random" hash is found).
+6  A: 

In other words, are the first 32 bits of an md5 hash just as "random" as any other substring?

Yes. If the answer were no, MD5 wouldn't be sufficiently secure. (sure, it has some minor cryptographic weaknesses but I'm not aware of any statistical ones)

Jason S
MD5 _isn't_ sufficiently secure as numerous attacks have shown :)
Joey
That statement is only true if qualifications are added. It is not sufficiently secure to make all collision attacks infeasible. It is (so far) sufficiently secure to make preimage attacks infeasible. see http://www.vpnc.org/hash.html
Jason S
also not to quibble, but my post didn't say MD5 was sufficiently secure. :-)
Jason S
I know; hence the ":-)"
Joey
ok, got it. just checking. :)
Jason S
+8  A: 

For any good hash function the individual bits should be approximately random. You should therefore be safe to use just the first 32 bits of an MD5 hash.

Alternatively you could also use CRC32 which should be much faster to compute (and the code is about 20 lines).

Joey
"I'm not concerned with the performance of md5--it is more than fast enough for my needs."
Kip
Kip: performance or not, CRC32 gives you a 32 bit hash, which is exactly what you want.
dwc