Basically it's because the output of MD5 contains less information than the input. This is what distinguishes a hash algorithm from an encryption algorithm.
Here's a simple example: imagine an algorithm to compute the hash of a 10-digit number. The algorithm is "return the last 2 digits." If I take the hash of 8023798734, I get 34, but if all you had is the 34, you would have no way to tell what the original number is because the hashing algorithm discarded 8 digits worth of information. It's similar with MD5, except that the hash is computed via a complex procedure instead of just chopping off part of the data.
EDIT: in response to the edit in the question, that's actually an interesting question. For one thing, the probability of a collision (finding two inputs that produce the same output) is basically 1 divided by the number of possible hash outputs. Collisions are an undesirable feature of hashes in the sense that you want to make them as unlikely as possible, so one way to get a better hash algorithm is to use a hash with a longer output. In the digits example above, taking the last 4 digits instead of the last 2 digits reduces the probability of a collision to 1 in 10000 instead of 1 in 100, so it's more likely that all the 10-digit numbers in whatever set you have will have different hash values.
There's also the issue of cryptographic security. When you want to use a hash to make sure that some data is not tampered with, it's desirable that whoever's doing the tampering can't predict what inputs will produce a given output. If they could, they would be able to alter the input data in such a way that the output (the hash) remains the same. Going back to the digits example again, let's say I'm going to email you the number 1879483129 and it is critically important that this number gets to you unaltered. I might call you up and tell you the hash of the number, which would be 29, and that way if something happened to the data in transit, you'd know. (Well not in this case because this is a useless hash algorithm, but you get the gist ;-) However, since the "last 2 digits" algorithm is not cryptographically secure, a nefarious hacker could change the number en route to, say, 5555555529 and you wouldn't know the difference.
It's recently been shown that MD5 is not cryptographically secure, or at least not as much as we might like. That means that it is possible to find different inputs which correspond to any given output. It's still a fine algorithm for protecting against random bit flips and the like, but if there's a chance someone might want to intentionally corrupt your data, you should really use something more secure, like SHA-256.