views:

242

answers:

5

Algorithm for a drawing and painting robot -

Hello

I want to write a piece of software which analyses an image, and then produces an image which captures what a human eye perceives in the original image, using a minimum of bezier path objects of varying of colour and opacity.

Unlike the recent twitter super compression contest (see: stackoverflow.com/questions/891643/twitter-image-encoding-challenge), my goal is not to create a replica which is faithful to the image, but instead to replicate the human experience of looking at the image.

As an example, if the original image shows a red balloon in the top left corner, and the reproduction has something that looks like a red balloon in the top left corner then I will have achieved my goal, even if the balloon in the reproduction is not quite in the same position and not quite the same size or colour.

When I say "as perceived by a human", I mean this in a very limited sense. i am not attempting to analyse the meaning of an image, I don't need to know what an image is of, i am only interested in the key visual features a human eye would notice, to the extent that this can be automated by an algorithm which has no capacity to conceptualise what it is actually observing.

Why this unusual criteria of human perception over photographic accuracy?

This software would be used to drive a drawing and painting robot, which will be collaborating with a human artist (see: video.google.com/videosearch?q=mr%20squiggle).

Rather than treating marks made by the human which are not photographically perfect as necessarily being mistakes, The algorithm should seek to incorporate what is already on the canvas into the final image.

So relative brightness, hue, saturation, size and position are much more important than being photographically identical to the original. The maintaining the topology of the features, block of colour, gradients, convex and concave curve will be more important the exact size shape and colour of those features

Still with me?

My problem is that I suffering a little from the "when you have a hammer everything looks like a nail" syndrome. To me it seems the way to do this is using a genetic algorithm with something like the comparison of wavelet transforms (see: grail.cs.washington.edu/projects/query/) used by retrievr (see: labs.systemone.at/retrievr/) to select fit solutions.

But the main reason I see this as the answer, is that these are these are the techniques I know, there are probably much more elegant solutions using techniques I don't now anything about.

It would be especially interesting to take into account the ways the human vision system analyses an image, so perhaps special attention needs to be paid to straight lines, and angles, high contrast borders and large blocks of similar colours.

Do you have any suggestions for things I should read on vision, image algorithms, genetic algorithms or similar projects?

Thank you

Mat

PS. Some of the spelling above may appear wrong to you and your spellcheck. It's just international spelling variations which may differ from the standard in your country: e.g. Australian standard: colour vs American standard: color

+1  A: 

That's quite a big task. You might be interested in image vectorizing (don't know what it's called officially), which is used to take in rasterized images (such as pictures you take with a camera) and outputs a set of bezier lines (i think) that approximate the image you put in. Since good algorithms often output very high quality (read: complex) line sets you'd also be interested in simplification algorithms which can help enormously.

RCIX
thanks rcix,I'm intending to spend about three months on this, experience suggests it will be longer than that.
compound eye
+1  A: 

Unfortunately I am not next to my library, or I could reccomend a number of books on perceptual psychology.

The first thing you must consider is the physiology of the human eye is such that when we examine an image or scene, we are only capturing very small bits at a time, as our eyes dart around rapidly. Our mind peices the different parts together to try and form a whole.

You might start by finding an algorithm for the path of an eyeball as it darts around. Perhaps it is attracted to contrast?

Next is that our eyes adjust the "exposure" depending on the context. It's like those high dynamic range images, if they were peiced together not by multiple exposures of a whole scene, but by many small images, each balanced on its own, but blended into its surroundings to form a high dynamic range.

Now there was a finding in a monkey brain that there is a single neuron that lights up if there's a diagonal line in the upper left of its field of vision. Similar neurons can be found for vertical lines, and horizontal lines in various areas of that monkey's field of vision. The "diagonalness" determines the frequency with which that neuron fires.

one might speculated that other neurons might be found and mapped to other qualities such as redness, or texturedness, and other things.

There's something humans can do that I've not seen a computer program ever able to do. it's something called "closure", where a human is able to fill in information about something that they are seeing, that doesn't actually exist in the image. an example:

          *






*                    *

is that a triangle? If you knew that it was in advance, then you could probably make a program to connect the dots. But what if it's just dots? How can you know? I wouldn't attempt this one unless I had some really clever way of dealing with that one.

There are many other facts about human perception you might be able to use. Good luck, you've not picked a straightforward task.

Breton
you've given me a really good idea,if i could track the artists gaze on the source image,i could determine the key areas of the image that attract attention,peoples faces, etc, and use this to determine which parts of the image need to be reproduced with greater care.I think I'd leave the triangle as it is, three stars, which suggests a better way of stating my aim, to reproduce an image which retains the features which convey most of the experience of the image. I can't hope to diagram the experience of seeing the imageI'll leave it to the viewer to find the triangle in the three stars
compound eye
+3  A: 

I cannot answer your question directly, but you should really take a look at artist/programmer (Lisp) Harold Cohen's painting machine Aaron.

Dave Everitt
thank you dave that's really interesting
compound eye
Like many of the pioneering artists working with technology, he started working on this back in the 70s, which I reckon puts him in the history books. If you can manage to see him somewhere, he's willing to talk about the process to other programmers.
Dave Everitt
+2  A: 

There is an model that can implemented as an algorithm to calculate a saliency map for an image, determining which parts of the image would get the most attention from a human.

The model is called itti koch model
You can find a startin paper here
And more resources and c++ sourcecode here

Janusz
thank you,that is very interesting
compound eye
one nice thing about metafilter is that you can assign multiple correct answers, here i only get one choice. I would
compound eye
(continued) ..would have liked to given everyone the green tick, but this is the answer that has given me the most interesting directions to explore, thanks to all of you
compound eye
A: 

i think a thing that could help you in this enormous task is human involvement. i mean data. like you could have many people sitting staring at random dots (like from the previous post) and connect them as they see right. you could harness that data.

data_smith