So iow, some algorithm to generate a unique, reasonable length filename based on binary file content. Two files that have the same binary content should have the same name. Obviously there would be limits to this, as presumably you couldn't have unique reasonable length filenames for each of a large set of large files only differing at a handful of bit positions. But presumably there is some heuristic, best approximation to this that for example exploits known attributes of typical image files. If I had the name of some algorithm that does this I can google it and find other approaches as well.
hashing - right. Thanks
Mark
2010-04-30 18:32:33
Actually it occured to me that I'm using the FreImage Library to generate these files from a bitmap to produce either a jpg or png file. What are the chances these files are already tagged interally with such a hashed identifier.
Mark
2010-04-30 18:35:01
FreeImage Library
Mark
2010-04-30 18:35:20
Maybe the output of FreeImage_ZLibCRC32? (I don't know what other metadata in your instance might affect this, 2 "identical" files might not get the same CRC? You'll have to try it...)
great_llama
2010-04-30 18:51:33
There are metadata and tag functions in FreeImage as well - still checking into them.
Mark
2010-04-30 19:14:11
+2
A:
I guess MD5 is worth checking out. Of course it will give you same result if the content is the same but I guess you can increment it until you get unique one.
m0s
2010-04-30 18:31:41
Well then MD5 is exactly what you need. Its not hard to find its source code, also I'm sure you can find tons of small command line tools that will md5 hashing.
m0s
2010-04-30 18:42:08
ask and ye shall receive - this forum is amazing (although hashing of some sort should have occured to me).
Mark
2010-04-30 18:45:57