tags:

views:

1493

answers:

9

The .NET framework ships with 6 different hashing algorithms:

  • MD5: 16 bytes (Time to hash 500MB: 1462 ms)
  • SHA1: 20 bytes (1644 ms)
  • SHA256: 32 bytes (5618 ms)
  • SHA384: 48 bytes (3839 ms)
  • SHA512: 64 bytes (3820 ms)
  • RIPEMD: 20 bytes (7066 ms)

Each of these functions performs differently; MD5 being the fastest and RIPEMD being the slowest.

MD5 has the advantage that it fits in the built-in Guid type. Which makes it really easy to use for identification.

MD5 however is vulnerable to collision attacks, SHA1 is also vulnerable but to a lesser degree.

Under what conditions should I use which hashing algorithm?

Particular questions I'm really curious to see answered are:

  • Is MD5 not to be trusted? Under normal situations when you use the MD5 algorithm with no malicious intent and no third party has any malicious intent would you expect ANY collisions (meaning two arbitrary byte[] producing the same hash)

  • How much better is RIPEMD than SHA1? (if its any better) its 5 times slower to compute but the hash size is the same as SHA1.

  • What are the odds of getting non-malicious collisions when hashing file-names (or other short strings)? (Eg. 2 random file-names with same MD5 hash) (with MD5 / SHA1 / SHA2xx) In general what are the odds for non-malicious collisions?

This is the benchmark I used:

    static void TimeAction(string description, int iterations, Action func) {
        var watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < iterations; i++) {
            func();
        }
        watch.Stop();
        Console.Write(description);
        Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
    }

    static byte[] GetRandomBytes(int count) {
        var bytes = new byte[count];
        (new Random()).NextBytes(bytes);
        return bytes;
    }


    static void Main(string[] args) {

        var md5 = new MD5CryptoServiceProvider();
        var sha1 = new SHA1CryptoServiceProvider();
        var sha256 = new SHA256CryptoServiceProvider();
        var sha384 = new SHA384CryptoServiceProvider();
        var sha512 = new SHA512CryptoServiceProvider();
        var ripemd160 = new RIPEMD160Managed();

        var source = GetRandomBytes(1000 * 1024);

        var algorithms = new Dictionary<string,HashAlgorithm>();
        algorithms["md5"] = md5;
        algorithms["sha1"] = sha1;
        algorithms["sha256"] = sha256;
        algorithms["sha384"] = sha384;
        algorithms["sha512"] = sha512;
        algorithms["ripemd160"] = ripemd160;

        foreach (var pair in algorithms) {
            Console.WriteLine("Hash Length for {0} is {1}", 
                pair.Key, 
                pair.Value.ComputeHash(source).Length);
        }

        foreach (var pair in algorithms) {
            TimeAction(pair.Key + " calculation", 500, () =>
            {
                pair.Value.ComputeHash(source);
            });
        }

        Console.ReadKey();
    }
+2  A: 

Which one you use really depends on what you are using it for. If you just want to make sure that files don't get corrupted in transit and aren't that concerned about security, go for fast and small. If you need digital signatures for multi-billion dollar federal bailout agreements and need to make sure they aren't forged, go for hard to spoof and slow.

tvanfosson
Lots of times when discussing solutions to problem I mention I use MD5 for quick identity (hashing a string), they say "but md5 is broken ... dont use it, use sha1" ... I dont really subscribe to this was wondering if anything is so fundamentally broken with some of the weaker hashs that they should be avoided ... eg real works cases where normal data produces collisions
Sam Saffron
Seeing as MD5 worked fine for millions of people for fifteen years, I suspect that it's OK for you if hash security isn't crucial.
mquander
@sambo MD5 works just fine for almost any case except when the actual security/integrity of your system *depends* on preventing collisions.
Rex M
@tvanfosson, see my answer
Sam Saffron
+2  A: 

I would like to chime in (before md5 gets torn apart) that I do still use md5 extensively despite its overwhelming brokenness for a lot of crypto.

As long as you don't care to protect against collisions (you are still safe to use md5 in an hmac as well) and you do want the speed (sometimes you want a slower hash) then you can still use md5 confidently.

Mike Boers
@Mike I'm with you on that, that was kind of what I was digging for with this question, is something about the weaker hash functions so fundamentally broken that they should never be used.
Sam Saffron
Further to this, if the data or required security of the data has a lifespan shorter than the crack period (a few minutes these days iirc) MD5 is absolutely fine. Situationally useful but still useful is the point.
annakata
A: 

I am not an expert at this sort of thing, but I keep up with the security community and a lot of people there consider the md5 hash broken. I would say that which one to use depends on how sensitive the data is and the specific application. You might be able to get away with a slightly less secure hash as long as the key is good and strong.

aloishis89
hash functions typically don't use keys
e5
+1  A: 

I don't really understand the question. You measured all the algorithms and researched how secure they are, so you already know under what conditions to use each. Use a fast one where performance is important and a secure one where reliability is important.

mquander
Im hoping for things like, "dont ever use algorithm XYZ cause its broken", "always use the most secure one cause the performance difference is negligible" eg. if RIPEMD is more secure than SHA1 and CPU cycles are not concern should it not always be preferred ?
Sam Saffron
You measured exactly what the performance difference is, although it might vary slightly on different data. I don't think I've ever written anything where CPU cycles were literally of no concern at all, but if they're of no concern to you, then I suppose you should not use a hash algorithm at all in this instance and just compare two files bit-for-bit instead. (Unless you're hashing sensitive data.)
mquander
Seriously, though, you know the performance difference, and you understand that some are more secure than others when it comes to manufacturing artificial collisions, so I don't think there's anything else to mention.
mquander
Updated my question with the 2 subquestions I really want answered
Sam Saffron
@mquander, see my answer ...
Sam Saffron
A: 

Here are my suggestions for you:

  1. You should probably forget MD5 if you anticipate attacks. There are many rainbow tables for them online, and corporations like the RIAA have been known to be able to produce sequences with equivalent hashes.
  2. Use a salt if you can. Including the message length in the message can make it very difficult to make a useful hash collision.
  3. As a general rule of thumb, more bits means less collisions (by pigeonhole principle) and slower, and maybe more secure (unless you are a math genius who can find vulnerabilities).

See here for a paper detailing an algorithm to create md5 collisions in 31 seconds with a desktop Intel P4 computer.

http://eprint.iacr.org/2006/105

Unknown
+2  A: 

In MD5's defense, there is no known way to produce a file with an arbitrary MD5 hash. The original author must plan in advance to have a working collision. Thus if the receiver trusts the sender, MD5 is fine. MD5 is broken if the signer is malicious, but it is not known to be vulnerable to man-in-the-middle attacks.

rlbond
+24  A: 

In cryptography, hash functions provide three separate functions.

  1. Collision resistance: How hard is it for someone to find two messages (any two messages) that hash the same.
  2. Preimage Resistance: Given a hash, how hard is it to find another message that hashes the same? Also known as a one way hash function.
  3. Second preimage resistance: Given a message, find another message that hashes the same.

These properties are related but independent. For example, collision resistance implies second preimage resistance, but not the other way around. For any given application, you will have different requirements, needing one or more of these properties. A hash function for securing passwords on a server will usually only require preimage resistance, while message digests require all three.

It has been shown that MD5 is not collision resistant, however, that does not preclude its use in applications that do not require collision resistance. Indeed, MD5 is often still used in applications where the smaller key size and speed are beneficial. That said, due to its flaws, researchers recommend the use of other hash functions in new scenarios.

SHA1 has a flaw that allows collisions to be found in theoretically far less than the 2^80 steps a secure hash function of its length would require. The attack is continually being revised and currently can be done in ~2^63 steps - just barely within the current realm of computability. For this reason NIST is phasing out the use of SHA1, stating that the SHA2 family should be used after 2010.

SHA2 is a new family of hash functions created following SHA1. Currently there are no known attacks against SHA2 functions. SHA256, 384 and 512 are all part of the SHA2 family, just using different key lengths.

RIPEMD I can't comment too much on, except to note that it isn't as commonly used as the SHA families, and so has not been scrutinized as closely by cryptographic researchers. For that reason alone I would recommend the use of SHA functions over it. In the implementation you are using it seems quite slow as well, which makes it less useful.

In conclusion, there is no one best function - it all depends on what you need it for. Be mindful of the flaws with each and you will be best able to choose the right hash function for your scenario.

Eric Burnett
+1 thank you for the thoughtful answer
Sam Saffron
+8  A: 

In order of weakest to strongest I would say:

  1. RIPEMD BROKEN, Should never be used eprint.iacr.org/2004/199.pdf
  2. MD-5 BROKEN, Should never be used, can be broken in 2 minutes with a laptop
  3. SHA-1 BROKEN, Should never be used, is broken in principal, attacks are getting better by the week
  4. SHA-2 WEAK, Will probably be broken in the next few years. A few weaknesses have been found. Note that generally the higher key size, the harder the hash function is to break. While key size = strength is not always true, it is mostly true. So SHA-256 is probably weaker than SHA-512.
  5. Skein NO KNOWN WEAKNESSES, is a candidate for SHA-3. It is fairly new and thus untested. It has been implemented in a bunch of languages.
  6. MD6 NO KNOWN WEAKNESSES, is another a candidate for SHA-3. Probably stronger than Skien, but slower on single core machines. Like Skien it is untested. Some security minded developers are using it, in mission critical roles.

Personally I'd use MD6, because one can never been to paranoid. If speed is a real concern I'd look at Skein, or SHA-256.

e5
A little black and white but +1 for informative
annakata
I wouldn't put Skein and MD6 that high in the list; there is a reason that the SHA-3 competition won't be finished till the end of 2012. It takes a long time and a lot of eyes to be convinced that a hash function is actually likely to be secure, and neither of these functions have been around long enough for that yet.
Eric Burnett
I agree with your sentiments, but I think the community is in a strange position. All the hash functions in use are dangerously close to being broken (maybe, maybe, not SHA2 256-512) and yet we have to wait to 2012 to choose a replacement.choose your poison:weak/broken or untested (most NIST candidates have not been public for more than 6 months)?Tough choice.
e5
+13  A: 

All hash functions are "broken"

The pigeonhole principle says that try as hard as you will you can not fit more than 2 pigeons in 2 holes (unless you cut the pigeons up). Similarly you can not fit 2^128 + 1 numbers in 2^128 slots. All hash functions result in a hash of finite size, this means that you can always find a collision if you search through "finite size" + 1 sequences. It's just not feasible to do so. Not for MD5 and not for Skein.

MD5/SHA1/Sha2xx have no chance collisions

All the hash functions have collisions, its a fact of life. Coming across these collisions by accident is the equivalent of winning the intergalactic lottery. That is to say, no one wins the intergalactic lottery, its just not the way the lottery works. You will not come across an accidental MD5/SHA1/SHA2XXX hash, EVER. Every word in every dictionary, in every language, hashes to a different value. Every path name, on every machine in the entire planet has a different MD5/SHA1/SHA2XXX hash. How do I know that, you may ask. Well, as I said before, no one wins the intergalactic lottery, ever.

But ... MD5 is broken

Sometimes the fact that its broken does not matter.

As it stands there are no known pre-image or second pre-image attacks on MD5.

So what is so broken about MD5, you may ask? It is possible for a third party to generate 2 messages, one of which is EVIL and another of which is GOOD that both hash to the same value. (Collision attack)

Nonetheless, the current RSA recommendation is not to use MD5 if you need pre-image resistance. People tend to err on the side of caution when it comes to security algorithms.

So what hash function should I use in .NET?

  • Use MD5 if you need the speed/size and don't care about birthday attacks or pre-image attacks.

Repeat this after me, there are no chance MD5 collisions, malicious collisions can be carefully engineered. Even though there are no known pre-image attacks to date on MD5 the line from the security experts is that MD5 should not be used where you need to defend against pre-image attacks. SAME goes for SHA1.

Keep in mind, not all algorithms need to defend against pre-image or collision attacks. Take the trivial case of a first pass search for duplicate files on your HD.

  • Use SHA2XX based function if you want a cryptographically secure hash function.

No one ever found any SHA512 collision. EVER. They have tried really hard. For that matter no one ever found any SHA256 or 384 collision ever. .

  • Don't use SHA1 or RIPEMD unless its for an interoperability scenario.

RIPMED has not received the same amount of scrutiny that SHAX and MD5 has received. Both SHA1 and RIPEMD are volunarable to birthday attacks. They are both slower than MD5 on .NET and come in the awkward 20 byte size. Its pointless to use these functions, forget about them.

SHA1 collision attacks are down to 2^52, its not going to be too long until SHA1 collisions are out in the wild.

For up to date information about the various hash functions have a look at the hash function zoo.

But wait there is more

Having a fast hash function can be a curse. For example: a very common usage for hash functions is password storage. Essentially, you calculate hash of a password combined with a known random string (to impede rainbow attacks) and store that hash in the database.

The problem is, that if an attacker gets a dump of the database, he can, quite effectively guess passwords using brute-force. Every combination he tries only takes a fraction of millisecond, and he can try out hundreds of thousands of passwords a second.

To work around this issue, the bcrypt algorithm can be used, it is designed to be slow so the attacker will be heavily slowed down if attacking a system using bcrypt. Recently scrypt has made some headline and is considered by some to be more effective than bcrypt but I do not know of a .Net implementation.

Sam Saffron
While both MD5 and SHA-1 have been weakened, MD5 is much weaker than SHA-1, while only slightly faster. Actual MD5 collisions have been found and used for real-world exploits (forging CA certificates), but as far as I know no actual SHA-1 collisions have been found (though the number of operations has been reduced considerably from brute-force). And given how much weaker MD5 is, I would not be surprised if second preimage attacks appeared sooner for MD5 than for SHA-1. Thus, I think you should use SHA-1 if you need speed and not collision resistance, and otherwise use one of the SHA-2 family.
Brian Campbell
@Brian its fairly clear that within the next few years people will be able to run collision attacks on SHA1, this effectively will make SHA1 as useful as MD5, The CA certificate thing is a collision attack, similarly in a few years people will be able to run the same attack on SHA1 CA certificates. The attack depends on a malicious party creating an EVIL and a GOOD certificate. There are no known primage attacks on MD5 and the fact that there are collision attacks does not make pre-image attacks more or less likely.
Sam Saffron
It's far less about which hash you use for passwords, than it is what is hashed. If your salt is known then your database is immediately vulnerable to a dictionary attack; if your salt is procedural, and your filesystem is compromised, you are (again) vulnerable; if your salt is omitted you are again compromised. The security at issue is, no matter what, WHAT is hashed. Certificates, I won't address because I've not dealt with them in as a programmer (IE, creating, understanding, etc).
The Wicked Flea