sha1sum duplicate files .

tags:

hashing
unix

views:

answers:

sha1sum duplicate files .

Can sha1sum return the same results for two files that are different ? I am asking this both from a theoretical and practical point of view.

+3 A:

Yes. The point of a hash is to avoid meaningful collisions, so that a small change to a file results in a large change to the hash value, so it is hard for an attacker to generate a collision.

Think about it: a SHA-1 hash is 160 bits. It is therefore unable to represent all the possible states of any file larger that 160 bits!

David Grant 2010-05-27 14:26:43

+2 A:

Mathematically, all hash functions have collisions -- that is, two inputs can return the same hash. This is true for any function that has an input of N bits and an output of M bits, where M<N.

A cryptographically sound hash algorithm, however, produces a hash that is so unpredictable, it would take millions of years to guess your way to an input that produces a particular hash. Some hash algorithms have known weaknesses that make this easier; when this happens the hash is considered 'broken' and everybody is supposed to switch to a new, better algorithm.

Though I don't follow crypto news that closely, SHA-1 is considered broken, so in theory if you can find the right tools, you can generate two files with the same SHA1 output.

Eric Seppanen 2010-05-27 14:29:06

Ok, so if SHA-1 is broken, what would you recommend instead?

FrustratedWithFormsDesigner 2010-05-27 14:29:50

I don't have an opinion. A quick web search suggests that the SHA-2 family (SHA-224, SHA-256, SHA-384, and SHA-512) are what people are migrating to.

Eric Seppanen 2010-05-27 14:33:42

@Frustrated The SHA-3 contestants that have not yet been eliminated at least have no known attacks against them as of now, which is better than you can say about SHA-1: http://ehash.iaik.tugraz.at/wiki/The_SHA-3_Zoo Or if you are not in a hurry, wait a bit and pick the winner.

Pascal Cuoq 2010-05-27 14:35:05

It is always a good idea to support multiple algorithms at once. It makes any possible migration a lot easier and faster.

hudolejev 2010-05-27 19:55:02

+1 A:

Wikipedia on SHA-1 collisions: link. Some real numbers.

hudolejev 2010-05-27 14:31:46

+2 A:

SHA-1 produces a hash that's only 160 bits long. If the files are longer than 160 bits, the hash can no longer represent all possible values of the files, so if you were to try all possible values in the files, some collisions would be inevitable.

There is a known attack for creating collisions with SHA-1, though it is quite expensive to carry it out. This attack, however, lets the attacker find two values that both produce the same result. It does not let the attacker create a file that produces the same hash as a file I've already created.

As far as alternatives go, SHA-256 is not just SHA-1 "widened" to produce a 256-bit result instead of 160 bits -- it's considerably more complex internally (uses six separate functions per round compared to three for SHA-1) which has made cryptanalysis considerably slower and more difficult. Some progress has been made, but most of it is purely theoretical.

For example, the best attack of which I'm aware only works against an intentionally weakened version of SHA-256 (42 rounds instead of the standard 64 rounds). Even at that, it's purely theoretical -- it produces the work to produce a preimage from the 2²⁵⁶ that's theoretically expected, to 2^251.7 -- which is still far too much to ever carry out.

Strangely, there appears to be considerably less progress in collision attacks against SHA-256. The best attacks I know of only work against around 20 rounds or so (working attacks against 19 rounds, some progress toward attacks up to 23 rounds or so).

Jerry Coffin 2010-05-27 15:05:48

ansaurus

tags:

views:

answers:

sha1sum duplicate files .

related questions