views:

177

answers:

4

Hello everyone, I want to write an algorithm that can take parts of a picture and match them to another picture of the same object.

For example, If I gave the computer a picture of a vase and a picture of a scene with the vase in it, I'd expect it to determine where in the image the vase is. How would I begin to develop an algorithm like this?

The final usage for this algorithm will be an application that for example with a picture of somebody's face could tell if they were in a crowd of people. This algorithm would eventually be applied to video streams.

edit: I'm not expecting an actual solution to this problem as I don't hope to solve it anytime soon. The real question was how do you define something like this to a computer so that you could make an algorithm to do it.

Thanks

+2  A: 

A former teacher of mine wrote his doctorate thesis on a similar sort of problem, except his input was a detailed 3D model of something, which he would use to find that object in 2D images. This is a VERY non-trivial problem, there is no single 'answer', certainly nothing that would fit the Stack Overflow format.

My best answer: gather a ton of money and hire a very experienced programmer.

Best of luck to you.

Chris Lawlor
I had a professor who was doing this too. My favorite was when the computer said a jet was a coffee cup. Silly curve definitions.
Drew
A: 

I think you will find this to be quite a challenge. This is an extremely difficult problem and is one of the many areas of computing that fall under the domain of artificial intelligence (AI). Facial recognition would certainly be the most popular variant of this problem and in spite of what you may read in the media, any claimed success are not what they are made out to be. I think the closest solutions involve neural nets and they require very clear and carefully selected images usually.

You could try reading here though. Good luck!

Arnold Spence
+1  A: 

The simple answer is, find a mathematical way to describe faces, that can account for angles and partial missing data, then refine and teach it.

Apparently apple has done something like this, however, it still makes mistakes and has to be taught as it moves forward.

I expect it will be more about the math, than about the programming.

sfossen
+1  A: 

The first problem you describe and the second are both quite different.

A major part of each is solved by the numerous machine vision libraries available. You may need a combination of techniques to achieve any success at either task.

In the first one, you would need something that generically recognizes objects. Probably i'd use a number of algorithms in concert to identify the foreground object in the model image and then do some kind of weighted comparison of the partitioned target image.

In the second case, examining faces, is a much more difficult problem relative to the general recognizer above. Faces all look the same, or nearly so. The things that a general recognizer would notice aren't likely to be good for differentiating faces. You need an algorithm already tuned to facial recognition. Fortunately this is a rapidly maturing field and you can probably do this as well as the first case, but with a different set of functions.

TokenMacGuy