Image Comparison

+1 A:

Your question opens a can of worms in terms of complexity.

If you want to compare two images to check if they are the same, then you need to perform an md5 on the file (removing possible metainfos which could distort your result).

If you want to compare if they look the same, then it's a completely different story altogether. "Look the same" is intended in a very loose meaning (e.g. they are exactly the same image but stored with two different file formats). For this, you need advanced algorithms, which will give you a probability for two images to be the same. Not being an expert in the field, I would perform the following "invented out of my head" algorithm:

take an arbitrary set of pixel points from the image.
for each pixel "grow" a polygon out of the surrounding pixels which are near in color (according to HSV colorspace)
do the same for the other image
for each polygon of one image, check the geometrical similitude with all the other polygons in the other image, and pick the highest value. Divide this value by the area of the polygon (to normalize).
create a vector out of the highest values obtained
the higher is the norm of this vector, the higher is the chance that the two images are the same.

This algorithm should be insensitive to color drift and image rotation. Maybe also scaling (you normalize against the area). But I restate: not an expert, there's probably much better, and it could make kittens cry.

Stefano Borini 2009-08-22 09:25:06

What would be the point of MD5? Why not compare them byte by byte?

Michael Borgwardt 2009-08-22 11:23:01

that comparing them byte by byte you need both of them in memory, or mmapped. If you compute the md5 you can work on one at a time, or in parallel on two different machines. if you have large TIFF images (such as from astronomical data or bioinformatics microarrays) it's a better strategy.

Stefano Borini 2009-08-22 12:13:34

+1 A:

If the images you are trying to compare have distinctive characteristics that you are trying to differentiate then PCA is an excellent way to go. The question of what format of the file you need is irrelevant really; you need to load it into the program as an array of numbers and do analysis.

Alex 2009-08-22 09:44:27

A:

If you want to determine if 2 images are the same perceptually, I believe the best way to do it is using an Image Hashing algorithm. You'd compute the hash of both images and you'd be able to use the hashes to get a confidence rating of how much they match.

One that I've had some success with is pHash, though I don't know how easy it would be to use with Visual C. Searching for "Geometric Hashing" or "Image Hashing" might be helpful.

Falaina 2009-08-22 10:20:55

A:

Testing for strict identity is simple: Just compare every pixel in source image A to the corresponding pixel value in image B. If all pixels are identical, the images are identical.

But I guess don't want this kind of strict identity. You probably want images to be "identical" even if certain transformations have been applied to image B. Examples for these transformations might be:

changing image brightness globally (for every pixel)
changing image brightness locally (for every pixel in a certain area)
changing image saturation golbally or locally
gamma correction
applying some kind of filter to the image (e.g. blurring, sharpening)
changing the size of the image
rotation

e.g. printing an image and scanning it again would probably include all of the above.

In a nutshell, you have to decide which transformations you want to treat as "identical" and then find image measures that are invariant to those transformations. (Alternatively, you could try to revert the translations, but that's not possible if the transformation removes information from the image, like e.g. blurring or clipping the image)

nikie 2009-08-22 10:31:37

+1 A:

I did something similar to detect movement from a MJPEG stream and record images only when movement occurs.

For each decoded image, I compared to the previous using the following method.

Resize the image to effectively thumbnail size (I resized fairly hi-res images down by a factor of ten
Compare the brightness of each pixel to the previous image and flag if it is much lighter or darker (threshold value 1)
Once you've done that for each pixel, you can use the count of different pixels to determine whether the image is the same or different (threshold value 2)

Then it was just a matter of tuning the two threshold values.

I did the comparisons using System.Drawing.Bitmap, but as my source images were jpg, there were some artifacting.

It's a nice simple way to compare images for differences if you're going to roll it yourself.

davewasthere 2009-08-22 10:42:24

ansaurus

tags:

views:

answers:

related questions