tags:

views:

533

answers:

1

A lot of files will be stored in the DB and I need file hashes to unique identify that the file was not changed. (In general, will be used as the Windows Personal Firewall part)

+1  A: 

This is, of course, not possible in general. Many people still use hashing for this purpose, and MD5 is a popular algorithm, that gives you a 128-bit "signature" for the file with a high probability of changing when the contents of the file changes.

In the general case, you need to look at every bit of the file to include it in the hash, and performance will probably be I/O-limited. It's a sequential sweep over all data in the file, updating the state of whatever hash algorithm you use for each new byte. On a modern CPU, the latter will be quicker than the former. This rather old analysis shows around ~45 MB/s on a Pentium 90 MHz CPU.

unwind