We use MD5 at my work for exactly what you're considering. Works great. We only need to detect duplicates uploads on a per-customer basis, which reduces our exposure to the birthday problem, but md5 would still be sufficient for us if we had to detect duplicates across all uploads rather than per customer. If you can believe the internet, the probability p of a collision given n samples and a hash size of b is bounded by:
p <= n (n - 1) / (2 * 2 ^ b)
A few years back I ran this calculation for n = 10^9 and b = 128 and came up with p <= 1.469E-21. To put that in perspective, 10^9 files is one per second for 32 years. So we don't compare files in the event of a collision. If md5 says the uploads were the same, they're the same.