views:

91

answers:

5

What is the best way to check if an image is unique using PHP? Say I have a directory of about 30 images (about 500*500 pixels), and someone uploads another picture, what is a good way to check if the uploaded image is not yet in the directory?

Is there some sort of way to create hash's of images which can be easily compared? I then could save the hash's of the images in the directory and compare those to the hash of the uploaded image.

Computing power is not much of an issue, it doesn't have to be able to handle more then a few pictures per minute. Nor is the fact that images with one pixel difference will be seen as different images. The system should just be able to filter out images that are excactly the same.

+3  A: 

run a checksum on the file .. if it matches one you already have then its probably the same exact image.

Scott Evernden
+6  A: 

Use md5 or sha1 on image file.

hsz
+1  A: 

Quick answer, but I recommend this approach:

  • Use md5sum to hash the images (there's a function in PHP for this).
  • If you're using a database, have the md5sum be a column of a table of picture files, and index the table by that field.
  • Otherwise, keep the hashes in a flat file like this:

    68b329da9893e34099c7d8ad5cb9c940 file2.bmp
    da1e100dc9e7bebb810985e37875de38 file1.jpg
    
Joey Adams
+2  A: 

The system should just be able to filter out images that are excactly the same.

In that case you could simply forget that you're talking about images and just treat them as binary files, using hash_file() to create a hash.

Of course, this would also result in different hashes for images that differ only in metadata such as EXIF comments in JPEG images. You'll have to decide whether that's a problem for you.

Michael Borgwardt
Nice point regarding the metadata, do you know of any way to automatically drop all the metadata with GD / Exif?
Alix Axel
I wouldn't bother going that route, since there can be all kinds of different metadata formats and there's even wild stuff like JPG images with RAR archives attached to the end, which works as both file formats (image decoders will ignore the stuff after the image data, while archive utilities will look at the end of the file for the archive index). If you go beyond treating the files as byte sequences, go all the way and see if you can hash the actual decoded bitmap data - not sure how to do that in PHP though.
Michael Borgwardt
+1  A: 

Byte-wise comparison of files will fail even when a small detail like a ID3 tag has changed. To compare the picture contents, you would have to open the image file and create a hash of the actual image pixel data. But even that can be undone by saving, say, a JPEG file twice with a slightly different quality level - the subtle encoding differences will create changes in the pixel colour values.

So if you are really looking to match image contents across formats and qualities, you are opening a huge can of worms :)

Pekka