I'm working on a solution where I need to associate metadata with files. In order to be able to associate the right file with the right metadata if the file is moved for instance I need to be able create a "fingerprint" of sorts to identify the file.
The obvious solution would be simply to calculate a hash from the file contents, however it seems calculating an hash from the entire file would be quite time consuming so I was thinking it might be better to just calculate the checksum from a a chunk of the file, like x bytes from the start of the beginning
Another problem is that some files do contain metadata headers that might change, mp3's for instance so the fingerprinting method would have to be able to adopt to what kind of file it is and therefore which "chunk" to best calculate the checksum on...
So my questions are: Is this a good way to do it, have anyone else done something similiar? How many bytes do you think is neeeded to calculate the checksum?
Thanks everyone for your input