tags:

views:

51

answers:

2

So iow, some algorithm to generate a unique, reasonable length filename based on binary file content. Two files that have the same binary content should have the same name. Obviously there would be limits to this, as presumably you couldn't have unique reasonable length filenames for each of a large set of large files only differing at a handful of bit positions. But presumably there is some heuristic, best approximation to this that for example exploits known attributes of typical image files. If I had the name of some algorithm that does this I can google it and find other approaches as well.

+5  A: 

Use an MD5 hash of the contents of the file.

great_llama
hashing - right. Thanks
Mark
Actually it occured to me that I'm using the FreImage Library to generate these files from a bitmap to produce either a jpg or png file. What are the chances these files are already tagged interally with such a hashed identifier.
Mark
FreeImage Library
Mark
Maybe the output of FreeImage_ZLibCRC32? (I don't know what other metadata in your instance might affect this, 2 "identical" files might not get the same CRC? You'll have to try it...)
great_llama
There are metadata and tag functions in FreeImage as well - still checking into them.
Mark
+2  A: 

I guess MD5 is worth checking out. Of course it will give you same result if the content is the same but I guess you can increment it until you get unique one.

m0s
That's what I want it to do - give the same name for the same content.
Mark
Well then MD5 is exactly what you need. Its not hard to find its source code, also I'm sure you can find tons of small command line tools that will md5 hashing.
m0s
ask and ye shall receive - this forum is amazing (although hashing of some sort should have occured to me).
Mark