views:

581

answers:

3

I'm trying to create a piece of software that automate the PC by capturing the screenshot, then OCR (Optical Character Recognition) it looking for a particular button to click (for example). I've got the mouse and keyboard control part, but now, I needed an OCR to process the screenshot. What I discovered is that Tesseract OCR does not seems to work very well with on-screen text. The text is either too small, or that some of text seems to be connected, like for example K and X. How should I go about this?

p/s: this is for an automated test program.

A: 

Perhaps look at this question on image enchancement prior to OCR. Otherwise this question is pretty similar to "OCR for .NET".

If you are feeling really bold you can always whip up a simple Perceptron or Neural Network based approach :-)

Graphain
+1  A: 

Try using commertial OCR. Opensource is far from being mature yet. I know that ABBYY OCR SDK works pretty well with screen text.

Tomato
A: 

I am not sure if this really fits the bill for you, but some of the better OCR that I have seen in automation is done by Tevron's CitraTest. It has a library of fonts included and if a fontset is not present, they will create a new one based on your submissions. Nagative factors with this tool would be cost and the usual issues related to variable screen resolution.

shambleh