As part of a Java based web app, I'm going to be accepting uploaded .xls & .csv (and possibly other types of) files. Each file will be uniquely renamed with a combination of parameters and a timestamp.
I'd like to be able to identify any duplicate files. By duplicate I mean, the exact same file regardless of the name. Ideally, I'd like to be able to detect the duplicates as quickly as possible after the upload, so that the server could include this info in the response. (If the processing time by file size doesn't cause too much of a lag.)
I've read about running MD5 on the files and storing the result as unique keys, etc... but I've got a suspicion that there's a much better way. (Is there a better way?)
Any advice on how best to approach this is appreciated.
Thanks.
UPDATE: I have nothing at all against using MD5. I've used it a few times in the past with Perl (Digest::MD5). I thought that in the Java world, another (better) solution might have emerged. But, it looks like I was mistaken.
Thank you all for the answers and comments. I'm feeling pretty good about using MD5 now.