tags:

views:

69

answers:

2
+5  Q: 

OCR for known font

Hi, im searching for an OCR lib, that can be parameterized with a font, because I always know it and I believe the recognition results will be lots better this way.

Does anyone know ?

+2  A: 

Check out OCRopus. It's open-source and sponsored by Google :) I'm not sure if it will allow to pick a particular font, but it seems to produce good results regardless.

Michael Mior
+1  A: 

Most OCR engines will handle this situation quite well. In fact OCR engines don't get as confused if their is only one font to recognise on a page. Strange but true in my experience.

If an OCR engine can read your font in the first place then I would just use it and not worry about it. There are better options to pick to improve recognition.

Many OCR engines allow you to set some recognition parameters to help improve recognition such as fixed width or proportional, serif or non serif, machine or hand print. You can also select a subset of characters such as uppercase or numeric only to improve results considerably. ie. If you only have numeric characters then the 0 (Zero) character can never get confused with an 'O' or 'o' or 'Ø'. You will find these hints will be more effective than the option of being able to choose the exact fonttype to OCR.

Other engines will allow you to train your OCR engine to deal with new fonts and this will help considerably if you have a strange font.

If your image quality is good and your fonts are clean and of a decent size then I would recommend using Tesseract OCR from Google and OCROpus as suggested above. It is free and works well on clean and clear text. If the text is a little difficult then there are definately better OCR engines out there such as ABBYY, Prime Recognition, Omnipage and many others although they will cost money.

Andrew Cash