views:

2077

answers:

4

Duplicate

How do I generate a hashcode from a byte array in c#

In C#, I need to create a Hash of an image to ensure it is unique in storage.

I can easily convert it to a byte array, but unsure how to proceed from there.

Are there any classes in the .NET framework that can assist me, or is anyone aware of some efficient algorithms to create such a unique hash?

+7  A: 

There's plenty of hashsum providers in .NET which create cryptographic hashes - which satisifies your condition that they are unique (for most purposes collision-proof). They are all extremely fast and the hashing definitely won't be the bottleneck in your app unless you're doing it a trillion times over.

Personally I like SHA1:

string hash;
using(SHA1CryptoServiceProvider sha1 = new SHA1CryptoServiceProvider())
{
    hash = Convert.ToBase64String(sha1.ComputeHash(byteArray));
}

Even when people say one method might be slower than another, it's all in relative terms. A program dealing with images definitely won't notice the microsecond process of generating a hashsum.

And regarding collisions, for most purposes this is also irrelevant. Even "obsolete" methods like MD5 are still highly useful in most situations. Only recommend not using it when the security of your system relies on preventing collisions.

Rex M
+1  A: 

You can use any of the standard hashing algorithms, but hashing can't technically guarantee uniqueness. Hashing is designed to be a relatively fast and/or small token to be able to see if one piece of data likely is the same as the other. It's fully possible for entirely different sets of data to produce the same hash, though being able to produce these algorithmically is very hard.

All of that aside, for checking likely identity, MD5 is fairly fast. SHA is more reliable (MD5 has been hacked, so shouldn't be use for security), but it's also slower.

Adam Robinson
+2  A: 

Creating new instance of SHA1CryptoServiceProvider every time you need to compute a hash is NOT fast at all. Using the same instance is pretty fast.

Still I'd rather do one of the many CRC algorithms instead of a cryptographic hash as hash functions designed for cryptography don't work too well for very small hash sizes (32 bit) which is what you want for your GetHash() override (assuming that's what you want).

Check this link out for one example of computing CRC in C#: http://sanity-free.org/134/standard_crc_16_in_csharp.html

P.S. the reason you want your hash to be small (16 or 32 bit) is so you can compare them FAST (that was the whole point of having hashes, remember?). Having hash represented by a 256-bit long value encoded as string is pretty insane in terms of performance.

zvolkov
+3  A: 

The part of Rex M's answer about using SHA1 to generate a hash is a good one (MD5 is also a popular option). zvolkov's suggestion about not constantly creating new crypto providers is also a good one (as is the suggestion about using CRC if speed is more important than virtually-guaranteed uniqueness.

However, do not use Encoding.UTF8.GetString() to convert a byte[] into a string (unless of course you know from context that it is valid UTF8). For one, it will reject invalid surogates. A method guaranteed to always give you a valid string from a byte[] is Convert.ToBase64String().

Jonathan
Thanks, you're quite right there. In fact that's what I always do, but I threw down that sample off the top of my head and did the first byte[]-to-string that came to mind.
Rex M
Thanks for the heads up Jonathan, thanks for the edit Rex
johnc