views:

954

answers:

5

I'm working on a .NET 3.5 project and I need a 32-bit hash value. There doesn't seem to be any methods in the .NET Cryptography classes that return a 32-bit hash (MD5 is 128 bits, SHA1 is 160 bits, etc.). I implemented a CRC32 class, but I find that the SHA1 and MD5 hashing functions that already exist are much faster.

Would there be any problem (i.e., increased chance of collisions) with me using the SHA1 hashing function and just breaking off the first 32 bits to store as my hash value?

A: 

If you are not intending to use the 32-bits for a cryptographic purpose then you should be OK. Otherwise, I wouldn't rely on the first 32-bits having the same distribution as the whole hash.

Why can't you just use the wider hash that's available?

Mitch Wheat
A: 

CRC32 is probably reasonable for your needs. This has been discussed in this question.

In terms of truncating a hash primitive, the only heavily used application of this is the SSL/TLS Pseudo Random Function (PRF) which is used to generate keys. It uses HMAC's, seeds, and labels to generate as many bytes as you need by hashing several times and then truncating to the amount of bytes you need.

As to your specific question though, you could read the output of the hash into Int32's and then xor them together if you're paranoid:

static void Main()
{
    int xorCrc = GetHashedCrc(new SHA1Cng(), new byte[] {0xDE, 0xAD, 0xBE, 0xEF});
}

private static int GetHashedCrc(HashAlgorithm algorithm, byte[] bytesToHash)
{
    byte[] hash = algorithm.ComputeHash(bytesToHash);
    int totalInt32s = hash.Length/sizeof(int);
    int result = 0;
    for(int i = 0; i < totalInt32s; i++)
    {
        int currentInt = BitConverter.ToInt32(hash, sizeof(int)*i);
        result = result ^ currentInt;
    }

    return result;
}
Jeff Moser
Bad idea. This just adds complexity and has no benefit at all. If you use SHA1, HMAC, etc. then the result is already "random" enough. Cutting the result is just find. It is the method that for example NIST proposes to get shorter hashes (e.g. SHA-224 or SHA-384) or for shorter HMACS.
Accipitridae
Agreed. I was just looking for a way to use all the bits, but you're right that it doesn't make a security difference and costs extra instructions.
Jeff Moser
+1  A: 

Given the assumption that a hash function distributes its inputs equally over its codomain, it seems logical to assume that it will also distribute equally over any subset of it. However, using a "native" 32bit hash function will probably still be the better choice. Maybe someone more into the matter can provide us with a better reason than just my gut feeling :)

n3rd
+5  A: 

Unless you want the extra features of the CRC32 (being a linear code), you should be fine with cutting the output to 32 bit.

Whether cutting the output of some cryptographic hash-functions hurts its security with respect to collision resistant is an open research problem ("unnatural" constructed examples exist if I remember correctly). But NIST (probably with the approval of the NSA) used the cutting technique to get the SHA-224 from SHA-256 anyway (see article about SHA in wikipedia).

EDIT: the CRC32 allows to detect (and maybe correct) single bit errors, whereas a cryptographic hash function should have the property that you can't find two inputs that have the same hash value.

Are you aware of the "birthday paradox" (see again wikipedia)? With an 32-bit checksum you expect to get a collision (i.e., two inputs with the same hash value) when you have about 2^16 inputs, and you want to hash many more inputs. (Rereading your comment this might not be a problem for you.)

If it's good enough for NIST, it's good enough for me.
raven
A: 

Why don't you just use string.GetHashCode(). It is designed to compute a 32-bit hash value and produce few collisions given real-world data. Of course, it's not secure, but your question doesn't include that as a requirement.

erikkallen