ansaurus

Question

Are the first 32 bits of a 160-bit SHA1 hash an acceptable substitute for a CRC32 hash?

Answer 1

A:

If you are not intending to use the 32-bits for a cryptographic purpose then you should be OK. Otherwise, I wouldn't rely on the first 32-bits having the same distribution as the whole hash.

Why can't you just use the wider hash that's available?

Mitch Wheat 2009-05-27 13:52:34

Answer 2

A:

CRC32 is probably reasonable for your needs. This has been discussed in this question.

In terms of truncating a hash primitive, the only heavily used application of this is the SSL/TLS Pseudo Random Function (PRF) which is used to generate keys. It uses HMAC's, seeds, and labels to generate as many bytes as you need by hashing several times and then truncating to the amount of bytes you need.

As to your specific question though, you could read the output of the hash into Int32's and then xor them together if you're paranoid:

static void Main()
{
    int xorCrc = GetHashedCrc(new SHA1Cng(), new byte[] {0xDE, 0xAD, 0xBE, 0xEF});
}

private static int GetHashedCrc(HashAlgorithm algorithm, byte[] bytesToHash)
{
    byte[] hash = algorithm.ComputeHash(bytesToHash);
    int totalInt32s = hash.Length/sizeof(int);
    int result = 0;
    for(int i = 0; i < totalInt32s; i++)
    {
        int currentInt = BitConverter.ToInt32(hash, sizeof(int)*i);
        result = result ^ currentInt;
    }

    return result;
}

Jeff Moser 2009-05-27 13:53:58

Bad idea. This just adds complexity and has no benefit at all. If you use SHA1, HMAC, etc. then the result is already "random" enough. Cutting the result is just find. It is the method that for example NIST proposes to get shorter hashes (e.g. SHA-224 or SHA-384) or for shorter HMACS.

Accipitridae 2009-05-29 06:15:52

Agreed. I was just looking for a way to use all the bits, but you're right that it doesn't make a security difference and costs extra instructions.

Jeff Moser 2009-05-29 12:04:10

Answer 3

+1 A:

Given the assumption that a hash function distributes its inputs equally over its codomain, it seems logical to assume that it will also distribute equally over any subset of it. However, using a "native" 32bit hash function will probably still be the better choice. Maybe someone more into the matter can provide us with a better reason than just my gut feeling :)

n3rd 2009-05-27 13:55:06

Answer 4

+5 A:

Unless you want the extra features of the CRC32 (being a linear code), you should be fine with cutting the output to 32 bit.

Whether cutting the output of some cryptographic hash-functions hurts its security with respect to collision resistant is an open research problem ("unnatural" constructed examples exist if I remember correctly). But NIST (probably with the approval of the NSA) used the cutting technique to get the SHA-224 from SHA-256 anyway (see article about SHA in wikipedia).

EDIT: the CRC32 allows to detect (and maybe correct) single bit errors, whereas a cryptographic hash function should have the property that you can't find two inputs that have the same hash value.

Are you aware of the "birthday paradox" (see again wikipedia)? With an 32-bit checksum you expect to get a collision (i.e., two inputs with the same hash value) when you have about 2^16 inputs, and you want to hash many more inputs. (Rereading your comment this might not be a problem for you.)

2009-05-27 14:09:45

If it's good enough for NIST, it's good enough for me.

raven 2009-09-04 13:29:27

Answer 5

A:

Why don't you just use string.GetHashCode(). It is designed to compute a 32-bit hash value and produce few collisions given real-world data. Of course, it's not secure, but your question doesn't include that as a requirement.

erikkallen 2009-09-04 13:52:14

ansaurus

tags:

views:

answers:

Are the first 32 bits of a 160-bit SHA1 hash an acceptable substitute for a CRC32 hash?

related questions