views:

723

answers:

2

I need to detect the bounding box(es) around portions of text in an image, and while there are quite a number of scholarly articles describing algorithms, I haven't found any implementations.

The specific problem I'm trying to solve is this:

Given an image that may or may not contain text, determine if the image does contain text, and if so, output the bounding rectangle around each area of text. (where "area" is defined by the algorithm, and hopefully it will err on the side of smaller areas vs. larger ones.)

Eventually, I'd like to turn the text into actual asci/unicode characters, but I think that is only tangentially related to this problem.

There are a number of tools out there that do OCR, (tesseract, Gocr, etc..) but they seem to only work on text that more or less fills the image with no real "image" content. (Eg: tesseract generates garbage when run on an image with subtitles.)

An implementation in java would be ideal, but I'm open to any cross-platform libraries/applications at this point.

Edit: I'm particularly interested in detecting artificial text, such as subtitles, or a HUD, which seems to be a simpler problem than detecting scene text, such as street signs. (Although scene text detection is even better.)

+1  A: 

This is not a simple-answer problem. I do not know of packaged implementations specifically for this purpose (which is not to say that they don't exist).

I think that a low-pass filter is one of the first things that the image should be run through - to determine larger areas where the color does not change. After that it may require binary thresholding to get the image to pure black and white, followed perhaps by another filter to clear out artifacts. Then an ocr engine would have a fighting chance, or you can create an algorithm to attempt to bound the area of highest concentration of black pixels.

If you know beforehand the color and/or general position of the text you would have a significantly easier time. If color is known you could filter on that color (or color range). If general position is known (as with subtitles) you can do a rough clip first to a new image, which will significantly reduce processing time.

If you don't know the orientation of the text it will be of course more complicated.

Demi
A low-pass filter would blur out the text. :)
endolith
A: 

Learning to detect scene text using a higher-order MRF with belief propagation by Chang is a pretty good attempt at this. It also attempts to detect non-uniform text as well.

monksy
The link is dead
Jason D
Just do a search on google scholar for it ... the location keeps changing.
monksy