As noted by Todd Yandell, MD5 is probably fast enough. If not, you can get something even faster by using a 32-bit or 64-bit CRC for your checksum. The major difference is that anybody can make up a new image with the same CRC; it is very easy to spoof. It is quite hard for someone to spoof an MD5 checksum. A minor difference is that the CRC has many fewer bits, but unless you have a very large number of images, a collision is still unlikely.
exiftool
claims to be able to extract the binary image from a JPEG file, so that you can compute your checksum without decompressing, but I can't figure out from the man page how to do it.
I did some experiments on a laptop Intel Core 2 Duo L7100 CPU, and an 8MP JPEG takes about 1 second to compress to PPM format, then another 1 second to do the checksum. Checksum times were not dramatically different using md5sum
, sum
, and sha1sum
. So your best bet might be to find a way to extract the binary data without decompressing it.
I also note that your checksum is going to be almost as good even if it uses far fewer pixels. Compare these two:
djpeg -scale 1/8 big.jpg | /usr/bin/sha1sum # 0.70s
djpeg big.jpg | /usr/bin/sha1sum # 2.15s