I'll come at this a different way. A lot of companies use MD5 hashes to "sign" their files, or assert that a file with a given hash is unique from another file. They base their entire systems on this, especially with respect to file deduplication, or single instance storage.
Now given the fact that you can have different files with the same hash (see these examples), what possible faith can you put into a system that asserts that no two files exist with the same MD5 hash?
Edit to answer comments:
Let's assume a few things, to take it out of the context of the mathmatical realm, and place it into the context of the original question "Why is the use of MD5 bad in practice?"
Say your company is involved in litigation, and the opposing party demands any and all documents relating to "X". You go and buy some software that will crawl all your storage locations and caltalogs the billions of files and emails and attachments, generating an MD5 hash for each. You then exclude all "duplicate" files based on the MD5 hash, and produce the rest of the relevant documents to opposing consul.
Now say the opposing counsel is a bit of an "enthusiastic" litigator, and wants to cast doubt that your company actually met its obligations, specifically in the trustworthiness of using MD5 as a deduplication mechinsim. The opposing pary is going for your company's throat, wanting the judge to impose some hefty sanctions, or even a summary judgement.
So if you were to go in front of a court in a litigation setting, where your company was under penalty of such sanctions, your defense woud be, yes, using MD5 is fine, because:
You need to distinguish among the
cases that
- (a) hash collisions can happen
(albeit with extremely small
probability),
- (b) two files can be
intentionally constructed to cause a
collision (this is a "collision
attack", it's possible with MD5),
- (c) an arbitrary file can be
intentionally constructed to cause a
collision with another file (preimage
attack, not known for MD5)...
- (d) that a hash collision w/o intentional
construction of files is likely to
happen (which is not true... you'd
need approx 2^64 different files to
have a likely collision in a 128-bit
hash.)
To which the litigator would likely respond:
(a1) Is it possible that two different files can have the same MD5 hash?
(your answer would have to be yes)
(b1) Do you know if there are any examples of two different files that have the same MD5 hash?
(again, your answer would have to be yes)
At this point, you have lost support in the eyes of most judges. It is now up to your legal team to steer the course back onto the "MD5 is fine" track. I'd rather not be in that position in the first place. At least with SHA-256 or other longer hashes, you can answer "No" to (b1). And thus, the whole point to the question: "Why is the use of MD5 bad in practice?"