Strength of hashing algorithms

views:

105

answers:

+5 Q:

Strength of hashing algorithms

I've noticed things such as

MD5 has been cracked for collisions and is no longer cryptographically secure; use SHA-1 instead. SHA-1 has been cracked for collisions and is no longer cryptographically secure; use SHA-2 instead.

From my current understanding, the chance of getting a certain hash h(d) from data d is equal for all hashing results. This implies, then, that the only strengthening mechanism for a hashing algorithm is to return longer hashes.

This also implies that all hashes (when not taking hash result length into account) are equally insecure to brute forcing, and that cryptographically broken only refers to quicker attacks other than brute force searching.

Is this true? What measures do modern cryptographic hashing algorithms use to prevent collision attacks?

+9 A:

The statement "X hash function has been broken" means that there's a defect in the hash function algorithm such that a collision can be generated faster than via bruteforcing. Look at this post by Bruce Schneier - he says that a SHA-1 collision can now be generated much faster, that's all.

So yes, they are all equally insecure to bruteforcing, but that's not what "X hash function has been broken" statement is about.

sharptooth 2010-09-22 12:01:41

yadab 2010-09-23 22:24:09

+1 A:

I only have a partial answer. =)

From my current understanding, the chance of getting a certain hash h(d) from data d is equal for all hashing results.

This is not true. The point of hashing is to get the same hash h(d) for some data d every time. The algorithm therefore may not have any "chances" of producing a hash. The dilemma a hash algorithm is facing is, that calculation of h(d) from d should be easy, but the calculation of a valid d from h(d) should be hard.

An algorithm is considered "not cryptographically secure", if calculating d from h(d) can be done faster than brute-force.

Its easy to see that not all hash algorithms have to be equally insecure by taking the hash function that return the first several bits of d. Regardless of the number of bits you take, its trivial to find collisions.

Jens 2010-09-22 12:03:42

I think the OP knows that. It was perhaps phrased poorly, but the intent is clearly to say that all hash outputs are equally likely.

Nick Johnson 2010-09-22 12:31:42

@Nick: Likely in what sense? They are deterministic....

Jens 2010-09-22 14:16:24

The probability of all outputs being generated for an unknown input is equal. This really isn't that difficult to comprehend.

Nick Johnson 2010-09-22 15:26:22

+1 A:

Yes, "cryptographically broken" means precisely that. I'm afraid I'm not an expert, so I can't answer the second question.

Marcelo Cantos 2010-09-22 12:03:45

+2 A:

MD5 is not collision resistant. The definition from Wikipedia states:

Collision resistance is a property of cryptographic hash functions: a hash function is collision resistant if it is hard to find two inputs that hash to the same output; that is, two inputs a and b such that H(a) = H(b).

MD5 generates a 128 bit hash that can now be broken within seconds now.

What measures do modern cryptographic hashing algorithms use to prevent collision attacks?

One way to prevent it is to increase the size of the hashing results (output size in bits). That allows time for cryptographers to investigate vulnerabilities. It's been ages since I done this...but yeah, in a nutshell :-)

The Elite Gentleman 2010-09-22 12:28:48

+1 for relevant link.

Donal Fellows 2010-09-22 12:37:09

+2 A:

What measures do modern cryptographic hashing algorithms use to prevent collision attacks?

The PHD thesis of cryptographer Bart Preneel is a good overview of this subject: Analysis and Design of Cryptographic Hash Functions.

You might be especially interested in the concept of provably secure cryptographic hash functions, which exist but are currently not considered practical.

Wim Coenen 2010-09-22 12:46:08

+4 A:

To be precise, there are three "attacks" that can be performed on a hash function h:

Preimages: given an "output" y (a sequence of bits of the right size, e.g. 128 bits for MD5), find a message m (an arbitrary sequence of bits) such that h(m) = y.
Second preimage: given a message m, find another message m', distinct from m, such that h(m) = h(m').
Collision: find two messages m and m', distinct from each other, such that h(m) = h(m').

A good hash function is a function which resists those three attacks as best as can possibly done. What is that best ? It depends on the hash output size n.

Consider a theoretically perfect hash function, consisting in a black box; in the box are a gnome, some dice, and a big book. When asked for a hash value, for a given input message, the gnome uses the dice to generate a random output, which he gives back. But he also writes the message and the output in his big book. If he is asked for the hash of the same message again, then he will return the same value as previously. This setup is what theoreticians call a random oracle. The random oracle never contradicts itself, but you cannot know anything on what it will answer on a given question (and input message) until that precise question has been asked to him.

If the random oracle outputs strings of length n bits, then one can find preimages and second preimages in time O(2ⁿ), and collisions in time O(2^n/2). The time is expressed here in terms of requests to the random oracle. The resistance on preimages is easy to understand: the attacker simply tries messages until, by pure luck, the oracle outputs the expected value; he cannot do better since even the gnome does not know anything about what he will get for any new input. Due to the uniformly random nature of the oracle, the expected number of trials is 2ⁿ. Resistance to second preimages flows along the same lines. For collisions, this is related to what is known as the birthday paradox.

A hash function is then deemed secure if its resistances appear to match those of the random oracle. It cannot do better; we require that it does not fare worse either.

A hash function is then said to be "broken" when someone came up with a method for finding preimages, second preimages or collisions with a success rate higher than the generic methods which work on a random oracle. SHA-1 is said to be broken because a method for finding collisions, with cost about 2⁶¹, was found (SHA-1 has a 160-bit output, and thus should resist collisions up to work factor about 2⁸⁰). Note that the full method was not run, i.e. we do not have an actual collision. 2⁶¹ is still very high, on the fringes of the doable (it would require several thousands of machines, running of several months or years). But 2⁶¹ if half a million times smaller than 2⁸⁰, so that is a break.

Sometimes, "breaks" are found for which the attack is ludicrously expensive (e.g. an attack in cost 2¹¹² for a theoretical resistance of 2¹⁶⁸). We then talk about an "academic" break: faster than the expected resistance, but still fully undoable with earth-based technology.

Note that not all usages of hash functions rely on resistance to collisions. Actually, most usages rely on resistance to preimages or second preimages. For those attacks, SHA-1 is currently as good as new, with its 2¹⁶⁰ resistance.

Conversely, even a perfect hash function with a 128-bit output should be considered as weak, at least for collision resistance: the generic attack, with a work factor of 2⁶⁴, is within the realm of the technologically feasible. This is why newer hash standards use larger outputs (e.g. SHA-2 begins at 224 bits of output). A large enough output defeats generic attacks.

Thomas Pornin 2010-09-22 15:21:50

ansaurus

tags:

views:

answers:

Strength of hashing algorithms

related questions