This is a combinatorics question with some theory in hashing algorithms required.
Let's say the input can be any random sequence of bytes 30 kB to 5 MB of size (I guess that makes quite a few combinations of input values :))
What is the probability that the first 4 bytes (or first n bytes) of a MD5 hash computed from the byte sequence will be the same for distinct files?
In case this can not be computed specifically for MD5 hash, then what is the probability that any hash function that generates uniformly distributed m-byte hashes will compute hash with collision on first n bytes for given range of inputs?