views:

30

answers:

1

I'd like to be able to highlight a word in an image of a document when the user searches for that word. Exactly like Google Books does here.

As far as I know, Tesseract and other open source OCR programs don't support this sort of function, so does anyone have any ideas how it might be done?

A: 

Yes they "support" it. Sort of.

They give you a rectangle that tells you where the word is. Using that, fill said rectangle with the color of your choice on the image using a color blending mode (e.g., keep the luma intact and just alter the chroma). This works well with B/W and grayscale images, which most books are, and is sufficient for most colored fonts too (except those in a colored background). A solution to this is to invert the colors instead of highlighting them, this is done in many applications (Foxit Reader comes to mind).

Camilo Martin
Thanks. Perhaps I don't know Tesseract well enough. I just thought it outputted a text file. Where do I find these rectangles?
Judson
See here: http://www.pixel-technology.com/freeware/tessnet2/ it's an open-source C# wrapper.
Camilo Martin