Can sha1sum return the same results for two files that are different ? I am asking this both from a theoretical and practical point of view.
Yes. The point of a hash is to avoid meaningful collisions, so that a small change to a file results in a large change to the hash value, so it is hard for an attacker to generate a collision.
Think about it: a SHA-1 hash is 160 bits. It is therefore unable to represent all the possible states of any file larger that 160 bits!
Mathematically, all hash functions have collisions -- that is, two inputs can return the same hash. This is true for any function that has an input of N
bits and an output of M
bits, where M<N
.
A cryptographically sound hash algorithm, however, produces a hash that is so unpredictable, it would take millions of years to guess your way to an input that produces a particular hash. Some hash algorithms have known weaknesses that make this easier; when this happens the hash is considered 'broken' and everybody is supposed to switch to a new, better algorithm.
Though I don't follow crypto news that closely, SHA-1 is considered broken, so in theory if you can find the right tools, you can generate two files with the same SHA1 output.
SHA-1 produces a hash that's only 160 bits long. If the files are longer than 160 bits, the hash can no longer represent all possible values of the files, so if you were to try all possible values in the files, some collisions would be inevitable.
There is a known attack for creating collisions with SHA-1, though it is quite expensive to carry it out. This attack, however, lets the attacker find two values that both produce the same result. It does not let the attacker create a file that produces the same hash as a file I've already created.
As far as alternatives go, SHA-256 is not just SHA-1 "widened" to produce a 256-bit result instead of 160 bits -- it's considerably more complex internally (uses six separate functions per round compared to three for SHA-1) which has made cryptanalysis considerably slower and more difficult. Some progress has been made, but most of it is purely theoretical.
For example, the best attack of which I'm aware only works against an intentionally weakened version of SHA-256 (42 rounds instead of the standard 64 rounds). Even at that, it's purely theoretical -- it produces the work to produce a preimage from the 2256 that's theoretically expected, to 2251.7 -- which is still far too much to ever carry out.
Strangely, there appears to be considerably less progress in collision attacks against SHA-256. The best attacks I know of only work against around 20 rounds or so (working attacks against 19 rounds, some progress toward attacks up to 23 rounds or so).