views:

142

answers:

1

I don't want to know what it says, and it will not be dealing with any distortion like a CAPTCHA, I just want to know if a bunch of images contain any text.

This is something that will be running on a couple of idle Linux servers, and a cron job will process a large batch of images multiple times a day.

One of the things I want to do in the process, is discard any images with text in them. I don't mind some false positives, but I would like to get as close to a zero-percent fail rate when it comes to identifying images with text that should be discarded as possible.

+2  A: 

The Tesseract-OCR is what google use for Google Books. Give it a try.

J-16 SDiZ
This seems a little heavy for what I'm looking to do, I may come back to it though if I can find nothing lighter. :)
joebert