ansaurus

Question

Limit characters tesseract is looking for

Answer 1

+2 A:

You should probably look into preparing some training files. Have a look at this tool

bbtesseract

epatel 2010-03-02 13:51:27

Looks nice, but regularly crashes with an unhandled exception error... Is there an alternative?

danilo 2010-03-02 16:19:14

Sorry, not that I can recall. Was some time since I used it. I used it to scan so called OCR numbers here in sweden with the isight on macs. I trained it to recognize only the special numbers http://www.memention.com/mye/

epatel 2010-03-02 23:40:15

I found this page: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseractLooks a little complicated, but doable. Thanks for the hint with the training files.

danilo 2010-03-03 14:20:18

Yes, I remember that page. It took awhile getting the hang of it, but afterwards it was pretty straight forward. If you figure out a good sequence of steps why not put them as an update in your question :)

epatel 2010-03-03 16:04:27

Answer 2

+2 A:

This tutorial details the steps required to train Tesseract. I found it very useful.

Buzzy 2010-03-19 09:55:29

Answer 3

+2 A:

Create a config file (e.g "letters") in tessdata/configs directory - usually /usr/share/tesseract/tessdata/configs.
Add the line to the config file:

tessedit_char_whitelist abcdefghijklmnopqrstuvwxyz

...or maybe [a-z] works.. dunno :-)
Then call tesseract similar to this:

tesseract input.tif output nobatch letters

That will limit tesseract to recognize only the wanted characters

Blomman 2010-06-06 06:08:44

Sorry for the late answer - this helped. Thank you :) By the way, the regex did not work. It was probably interpreted literally.

danilo 2010-07-11 09:09:04

tessedit_char_whitelist 0123456789, i did this to fetch numbers from an image but out of 20 digits only 4 were correct.Any help would be greatly appreciated!!thank u

SWATI 2010-10-01 10:50:51

SWATI: what kind of image is it? try cleaning up the source image. for example using imagemagick.

danilo 2010-10-21 12:27:22

ansaurus

tags:

views:

answers:

Limit characters tesseract is looking for

related questions