file names based on file content

tags:

algorithm

views:

answers:

+2 Q:

file names based on file content

So iow, some algorithm to generate a unique, reasonable length filename based on binary file content. Two files that have the same binary content should have the same name. Obviously there would be limits to this, as presumably you couldn't have unique reasonable length filenames for each of a large set of large files only differing at a handful of bit positions. But presumably there is some heuristic, best approximation to this that for example exploits known attributes of typical image files. If I had the name of some algorithm that does this I can google it and find other approaches as well.

+5 A:

Use an MD5 hash of the contents of the file.

great_llama 2010-04-30 18:30:29

hashing - right. Thanks

Mark 2010-04-30 18:32:33

Actually it occured to me that I'm using the FreImage Library to generate these files from a bitmap to produce either a jpg or png file. What are the chances these files are already tagged interally with such a hashed identifier.

Mark 2010-04-30 18:35:01

FreeImage Library

Mark 2010-04-30 18:35:20

Maybe the output of FreeImage_ZLibCRC32? (I don't know what other metadata in your instance might affect this, 2 "identical" files might not get the same CRC? You'll have to try it...)

great_llama 2010-04-30 18:51:33

There are metadata and tag functions in FreeImage as well - still checking into them.

Mark 2010-04-30 19:14:11

+2 A:

I guess MD5 is worth checking out. Of course it will give you same result if the content is the same but I guess you can increment it until you get unique one.

m0s 2010-04-30 18:31:41

That's what I want it to do - give the same name for the same content.

Mark 2010-04-30 18:38:25

Well then MD5 is exactly what you need. Its not hard to find its source code, also I'm sure you can find tons of small command line tools that will md5 hashing.

m0s 2010-04-30 18:42:08

ask and ye shall receive - this forum is amazing (although hashing of some sort should have occured to me).

Mark 2010-04-30 18:45:57

ansaurus

tags:

views:

answers:

file names based on file content

related questions