views:

433

answers:

1

My software needs to read a fixed-length handwritten number.

While I could use a general-purpose library like Tesseract, I am sure there is something smarter. Tesseract will probably misinterpret some of the 1 or 7 as I or l, whereas a software that expects only numbers would not.

Knowing that there are only numbers (American-English way of writing them), the algorithm could focus on 10 potential matches instead of hundreds of symbols.

Any experience OCRing handwritten number-only fields?
What open source library/software did you get the best results with?

+1  A: 

From the FAQ of Tesseract:

How do I recognize only digits?

In 2.03 and above:

Use

TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");

before calling an Init function or put this in a text file called tessdata/configs/digits:

tessedit_char_whitelist 0123456789

and then your command line becomes:

tesseract image.tif outputbase nobatch digits

Warning: Until the old and new config variables get merged, you must have the nobatch parameter too.

But I think since it was designed for printed—not handwritten—text, accuracy might suffer even for digits only.

Joey
Thanks for this!But indeed Tesseract don't seem to be designed for handwritten stuff so it would probably be quite mediocre at it.
Nicolas Raoul
@nic: Maybe you could re-train it. It seems to be possible.
Joey
Seems possible indeed. But when I propose this solution to the client company, they might look at me funny... A proven solution with community (even small) would probably be more credible. I would be surprised if it does not exist already.
Nicolas Raoul
Now that I think about it, handwritten digits should not be really difficult to recognize... they are not linked into fuzzy words like Latin letters. Much easier to recognize than handwritten text.
Nicolas Raoul