views:

623

answers:

6

If someone is purposely trying to modify two files to have the same hash, what are ways to stop them? Can md5 and sha1 prevent the majority case?

I was thinking of writing my own and I figure even if I don't do a good job if the user doesn't know my hash he may not be able to fool mine.

What's the best way to prevent this?

+9  A: 

MD5 is generally considered insecure if hash collisions are a major concern. SHA1 is likewise no longer considered acceptable by the US government. There is a competition under way to find a replacement hash algorithm, but the recommendation at the moment is to use the SHA2 family - SHA-256, SHA-384 or SHA-512.

You can try to create your own hash - it would probably not be as good as MD5, and 'security through obscurity' is likewise not advisable.

If you want security, hash with multiple hash algorithms. Being able to simultaneously create files that have hash collisions using a number of algorithms is excessively improbable. [And, in the light of comments, let me make it clear: I mean publish both the SHA-256 and the Whirlpool values for the file - not combining hash algorithms to create a single value, but using separate algorithms to create separate values. Generally, a corrupted file will fail to match any of the algorithms; if, perchance, someone has managed to create a collision value using one algorithm, the chance of also producing a second collision in one of the other algorithms is negligible.]

The Public TimeStamp uses an array of algorithms. See, for example, sqlcmd-86.00.tgz for an illustration.

Jonathan Leffler
MD5 is totally broken. It's junk and should never be used in new systems. Combining multiple hash functions is a good idea ONLY IF you need a very long (>512 bit) hash. Otherwise, a single long hash will be stronger: SHA2-384 has a lower collision chance than, say, SHA1 and RIPEMD-160 concatenated.
kquinn
The second part about security is not true - combining multiple hash algorithms is inadvisable without having an in-depth mathematical understanding of how they interact. Holes in one can be amplified by the other.
Rex M
What I mean, and what is illustrated by the Public TimeStamp web site, is publish simultaneously the SHA-256, RIPEMD-160, SHA1 and maybe Whirlpool hashes; 4 independent hashes of the same file. Even if one hash is defeated, the chances of the same file also defeating the others is negligible.
Jonathan Leffler
Yeah, that site publishes a very long hash (4x160 = 640 bits, longer than SHA512), so it does provide good security. But if you think about it for a while (a long while, for me...), you can see that even for *completely different* algorithms, if their total length is short, they are together weak.
kquinn
Hashing with multiple algorithms does not improve the security. Imagine the attacker knows that md5(a) == md5(b). Then sha1(md5(a)) == sha1(ms5(b))
BlueRaja - Danny Pflughoeft
@BlueRaja: The suggestion I was making is that if md5(a) == md5(b), it is extremely improbable that sha1(a) == sha1(b). By quoting both the MD5 and SHA1 checksums of the file, you are making it even less probable that a single replacement file (with different content from the original) will simultaneously generate the same checksum using two or more different algorithms.
Jonathan Leffler
@Jonathan: Ah, I see what you're saying - yes, you are correct, sorry.
BlueRaja - Danny Pflughoeft
+2  A: 

Why are you trying to create your own hash algorithm? What's wrong with SHA1HMAC?

Yes, there are repeats for hashes.
Any hash that is shorter than the plaintext is necessarily less information. That means there will be some repeats. The key for hashes is that the repeats are hard to reverse-engineer.

Consider CRC32 - commonly used as a hash. It's a 32-bit quantity. Because there are more than 2^32 messages in the universe, then there will be repeats with CRC32. The same idea applies to other hashes.

Cheeso
+1  A: 

This is called a "hash collision", and the best way to avoid it is to use a strong hash function. MD5 is relatively easy to artificially build colliding files, as seen here. Similarly, it's known there is a relatively efficient method for computing colliding SH1 files, although in this case "relatively efficient" still takes hunreds of hours of compute time.

Generally, MD5 and SHA1 are still expensive to crack, but not impossible. If you're really worried about it, use a stronger hash function, like SHA256.

Writing your own isn't actually a good idea unless you're a pretty expert cryptographer. most of the simple ideas have been tried and there are well-known attacks against them.

If you really want to learn more about it, have a look at Schneier's Applied Cryptography.

Charlie Martin
Be aware that Schneier has expressed reservations about the value of that book to lay people. The quote was some about a lot of bad cryptography code being written by people who had read Applied Cryptography. But it is a good read.
dmckee
It's a great read if you're interested in the gory details. Schneier's reservations mostly centered around the fact that the math behind cryptography is far and away the most secure part of modern crypto, and that book doesn't address human concerns, which are where the real problems lie.
Bill the Lizard
@Bill: Yeah. Nice summary.
dmckee
@Bill, absolutely right, and I've seen some of it. It's still the best code-oriented crypto book I know of.
Charlie Martin
+1  A: 

I don't think coming up with your own hash algorithm is a good choice.

Another good option is used Salted MD5. For example, the input to your MD5 hash function is appended with string "acidzom!@#" before passing to MD5 function.

There is also a good reading at Slashdot.

m3rLinEz
I know this answer is old, but it was up-voted so I want to add a quick warning. First, MD5 is broken and salting it doesn't fix the break. Salting has its uses, particularly in thwarting rainbow tables, but it doesn't help here because both sides will need the salt. The other detail is that salt has to be prepended, not appended, else it does nothing to distinguish between a pair of plaintexts that are designed to collide. I recommend using something in the SHA2 family, as others have suggested.
Steven Sudit
+2  A: 

If the user doesn't know your hashing algorithm he also can't verify your signature on a document that you actually signed.

The best option is to use public-key one-way hashing algorithms that generate the longest hash. SHA-256 creates a 256-bit hash, so a forger would have to try 2255 different documents (on average) before they created one that matched a given document, which is pretty secure. If that's still not secure enough for you, there's SHA-512.

Also, I think it's worth mentioning that a good low-tech way to protect yourself against forged digitally-signed documents is to simply keep a copy of anything you sign. That way, if it comes down to a dispute, you can show that the original document you signed was altered.

Bill the Lizard
Birthday Paradox? 2^128, I think... Still, an awful lot...
Jonathan Leffler
You don't divide the exponent by 2, you reduce it by one, which is why it's 255 instead of 256.
Bill the Lizard
Watch out for the difference between matching a given target and trying to generate a colliding pair. Bill's tlaking about the former, Jonathan about the latter.
dmckee
Also, the birthday paradox only applies when you're trying to find any two matches, not when you're trying to forge a particular document.
Bill the Lizard
+1  A: 

There is a hierarchy of difficulty (for an attacker) here. It is easier to find two files with the same hash than to generate one to match a given hash, and easier to do the later if you don't have to respect form/content/lengths restrictions.

Thus, if it is possible to use a well defined document structure and lengths, you can make an attackers life a bit harder no matter what underling hash you use.

dmckee