views:

171

answers:

3

I'm trying to build a script to go through my original, high-res photos and replace the old, low-res ones I uploaded to Flickr before I had a pro account.

For many of them I can just use Exif info such as date taken to determine a match. But some are really old, and either the original file didn't have Exif info, or it got clobbered by whatever stupid resizing software I used at the time.

So, unable to rely on metadata, I'm forced to resort to the content itself. The problem is that the originals are in different resolutions than the ones on Flickr (which is the whole point of this endeavour). So is there a way for me to compare them with some sort of fuzzy similarity measure that would allow me to set a threshold for requiring human input or not?

I guess knowing one image is a resized version of the other can yield better results than general similarity. A solution in any language will do, but Ruby would be a plus :)

+6  A: 

Interesting problem, btw :)

Slow-ish solution - excellent chance of success

Use a scale-invariant feature detector to find corresponding features in both images. If the features are matched with with a high score at similar locations, then you have your match.

I'd recommend SIFT which generates a scale & rotation invariant 128-integer descriptor for a feature found in an image. SURF (available in OpenCV) is another (faster) feature point detector.

You can match features across two images via bruteforce (compare each descriptor to a descriptor in the other image) which is O(n^2) but pretty fast (especially in the VL SIFT implementation). But if you need to compare the features in one image to several images (which you might have to) you should build a tree of the features to query it with the other image's features. K-D trees are useful, and OpenCV has a nice implementation.

Fast solution - might work

Downsample your high-res image to the low-res dimensions and use a similarity measure like SAD (where the sum of the differences between block of, say, 3x3 pixels around a pixel in both images is the score) to determine a match.

Jacob
+1 for the fast solution.
Soviut
how would you fit the "fast solution" on a relational database?
knoopx
+1  A: 

I'd recommend scripting a solution off of ImageMagick. The following (from the documentation on comparing images with IM) would output a comparative value that you can use.

convert image1 image2 \
        -compose difference -composite -colorspace gray miff:- |\
  identify -verbose - |\
    sed -n '/^.*Mean: */{s//scale=2;/;s/(.*)//;s/$/*100\/32768/;p;q;}' | bc
Mike Buckbee
+1  A: 

Compute the normalized color histogram of both images and compare them using some method (histogram intersection, for example - see the link above). Note the normalized histogram is needed because the images present different resolutions. If the images are so dissimilar, they are not the same picture. But if they are similar, you have one of these two cases: (i) they are the same picture or (ii) they are different pictures but present similar global color distributions.

For case (ii), split the images and rectangular tiles and repeat the process, comparing correspondent tiles. You are trying to account for local properties of the image. Rank the results and pick the best match.

TH