tags:

views:

124

answers:

5

Suppose I have any "long" hash, like a 16 bytes MD5 or a 20 bytes SHA1. I want to reduce this hash to fit on 4 bytes, for GetHashCode() purposes.

First, I'm perfectly aware that I'll get more collisions. That's totally fine in my case, but I'd still prefer to get the less possible collisions.

There are several solutions to my problem:

  • I could take the 4 first bytes of the hash.
  • I could take the 4 last bytes of the hash.
  • I could take 4 random bytes of the hash.
  • I could generate a hash of the hash, involving classic prime numbers multiplications.

Are there other solutons I didn't think about? And more importantly, what method will give me the most unique hash code? I'm currently supposing they're almost equivalent.

Microsoft chose that the public key token of an assembly is the last 8 bytes of the SHA1 hash of its public key, so I'll probably go for this solution but I'd like to know why.

+8  A: 

Any hash is already a reduction.

Cryptographic hashes are designed so that no part of the data has more influence on any part of the hash than any other. So it doesn't matter which bits of the hash you pick.

Ben Voigt
+1  A: 

If you take a random 4 bytes, then you get a situation where two of your SHA1 hashes which are exactly the same produce different GetHashCode hashes.

I would just choose the first 4 bytes - SHA1 is designed so that no bytes should be as important as any other set of bytes.

Callum Rogers
did you mean, "no bytes should be *more* important than any other set"?
Ben Voigt
+5  A: 

Any option except the third one - picking bytes by random - works fine. If you pick the bytes by random, the same input will produce different hash codes each time, which defeats the purpose of the hash code.

Guffa
I was thinking about 'hardcoded' random of course. Thanks for your feedback though.
Julien Lebosquain
@Julien: Aha, a random constant... http://www.xkcd.com/221/ ;)
Guffa
A: 

If you have reasonable number of the hashes, index them (e.g. store in the database):

1 - 987baf9gfd79b7979debe90085eadf5
2 - 9754gccgfd79s7979abbc90085eadf5
...
takeshin
A: 

Hi Julien,

If your current hash is held as a string, simply call GetHashCode on that string and it will return you an int, 4 bytes.

Any use?

Adam