ansaurus

Question

image recognition: a box and randomly placed text

Answer 1

+1 A:

You can apply any border detection algorithm to detect box. and since color of text is different form the color of background you can use even linear search to find black pixels of 'text'. I may be wrong, sorry about that.

Trickster 2009-10-25 14:47:38

Answer 2

+2 A:

[This is of interest to us.] I am assuming your input is effectively a bitmap - a rectangular matrix of pixels. The first question is whether it is aligned with the axes - if it's been scanned it's probably not. You may need deskewing algorithms (rather dated but it's a useful start: http://www.eecs.berkeley.edu/~fateman/kathey/node11.html)

The classic line detection is the Hough transform (http://en.wikipedia.org/wiki/Hough%5Ftransform) though our current collaborators do better than this for simple boxes and project pixels onto different viewpoints - similar to tomography. Rotate the image and count the density/histogram of points on the projection lines. For simple boxes that gives a clear signal.

For the text I suspect you either have to have a set of likely fonts or to use machine learning. In the latter you have to devise features and then select a series of images that are classified by humans as text and not-text. Your algorithm (and there are many, neural nets, maximum entropy, etc.) are then trained against these.

The quality of the pixel map makes a great deal of difference. Documents 20 years ago and much harder than bitmaps of documents created though drawing programs and dumped as PDF (of course if you can interpret text in PDF that helps a good deal.)

peter.murray.rust 2009-10-25 14:54:14

My documents are simple... they are gif images, so they are clean.

Dervin Thunk 2009-10-25 15:13:20

@Dervin GIF is simply a transfer format for pixels. they could hold very messy text (e.g. the captchas in SO) or fairly clean text - e.g. the fonts in SO itself. But many images are not clean when analysed in detail as they may include antialiasing

peter.murray.rust 2009-10-25 15:32:13

Peter, the image would be closer to this: http://images.freshmeat.net/editorials/r_intro/images/line-graph-1.jpg

Dervin Thunk 2009-10-25 15:48:55

Thanks, Peter. I agree it will never be 100%, so there will always be some manual intervention.

Dervin Thunk 2009-10-25 16:29:19

Answer 3

A:

A very simple algorithm would to scan left-to-right and top-to-bottom, looking for the three black pixels that make up an upper-left corner of a box (and then continuing to scan for the three pixels that would make up the matching lower-right corner). Once you've identified each box in the image in this way, you could scan the inner portion and assume that any non-white pixels mean there is some text in the box. Of course, this would not differentiate between text and images inside the box, but that would be a much more difficult problem anyway.

MusiGenesis 2009-10-25 15:23:13

sorry about my naive question, but what happens if in your doc you have a T at a small y coordinate? wouldn't that be confused with the left corner?

Dervin Thunk 2009-10-25 15:38:47

You cannot assume there are exactly 3 pixels - it depends on the line width, registeration with the rasterisation program , antialisaing and a lot more.

peter.murray.rust 2009-10-25 16:17:34

@Dervin: you could rule out a "T" by checking the pixel to the left, and you could rule out a "+" by checking to the left and above, but all of this assumes a relatively simple image. My algorithms here wouldn't work very well with the sample image you posted below peter's comment. It wouldn't pick up the lower-right corner of the graph's box, it would falsely recognize the upper-left of the "5"s and the sideways "D" in "DJIA" as corners, etc.

MusiGenesis 2009-10-25 18:19:54

@Dervin: by the way, your sample graph in your comment to peter's answer caused me actual physical pain. This answer is why: http://stackoverflow.com/questions/1538235/what-problems-have-you-solved-using-genetic-algorithms-genetic-programming/1538464#1538464

MusiGenesis 2009-10-25 21:23:30

ansaurus

tags:

views:

answers:

image recognition: a box and randomly placed text

related questions