I'm putting together a script to find remove duplicates in a large library of images. At the moment I'm doing a two pass filter of first finding files of the same size and then doing a sha256 on a 10240 byte piece of the file to get a fingerprint of the files with the same size (code here).
It works well, but I'm guessing there are probably checksums built in to the jpeg format that I could use instead of doing the sha256.
Does anyone know if there are checksums or other components that could act as checksums / fingerprints? If so, is there an efficient way to access them?