views:

452

answers:

3

Up to what string length is it possible to use MD5 as a hash without having to worry about the possibility of a collision?

This would presumably be calculated by generating an MD5 hash for every possible string in a particular character set, in increasing length, until a hash appears for a second time (a collision). The maximum possible length of a string without a collision would then be one character less than the longest of the colliding pair.

Has this already been tested for MD5, SHA1, etc?

+2  A: 

I don't believe this has been done, though it might be an interesting experiment. It doesn't really make sense for any use of these hash functions that I am aware of, except for PKCS1 signatures. If you're string is smaller than the output block size, then don't bother hashing it at all. Then you are guaranteed uniqueness.

EDIT: ahh, but for passwords you do need this capability.

GregS
+2  A: 

I doubt if there is any useful length where you're not going to have possible collisions. Those algorithms are not really used for that purpose. It's meant to try to be unique for slight changes in the data (like corrupted files) rather than unique over all possible sets of data.

Mike Nelson
+5  A: 
Jason S
+1, on the assumption that the math is correct ... and now time to delete my answer
kdgregory
Ken Fox
@kdgregory: don't delete your answer! it has useful value!
Jason S
I like this explanation. It seems like the first collision's probably going to occur within 8 or 9 bytes and - as others have commented - if the strings are shorter than this then it's probably not worth hashing them anyway.
Alf Eaton