views:

4968

answers:

6

I'm interested in simple OCR methods and algorithms. And with simple I mean simple!

Best would be a tutorial/article/documentation without dependencies on 3rd party librarys if that's even possible.

I would really like to build up my knowledge from the ground up.

The programming language doesn't matter.

Thanks in advance!

Edit:

An interesting link I found in another question: OCR and Neural Nets in JavaScript

I especially like the python implemenation.

+8  A: 

I think you still may want to use a library to handle the windowing, image loading, display etc...

OpenCV is useful for learning computer vision. Here is a tutorial for doing basic OCR with OpenCV.

OpenCV is open source so you can study the implementation of the functions as well.

Karl Voigtland
I know about OpenCV but I wanted to start without a external library to understand the concepts.
daddz
+4  A: 

Methods such as support vector machines (SVMc) and neural networks are used for character recognition algorithms.

You might find useful the below links:

Nick D
+1  A: 

I did a little bit on this in the past. References for this kind of theory are hard to come by, but this book might help. After learning the basic principles from this book I went on to use a library and try to re-implement it to do some custom recognition. As Karl said, something like OpenCV is your best bet. If you are going to implement it from the ground up, a c type language would be good for interfacing with the hardware. Good luck!

Chazadanga
+1  A: 

If you want to build a simple character recognizer a nearest neighbor algorithm is the easiest to implement. You can use raw images as the feature space. Next you need some training data. You could just generate some by applying writing a small program to dump out images of single characters images and perhaps apply some image operators to degrade the image. Java has all the libraries you would need to do this.

If you want to go into this subject in more depth Pattern Classification is considered a classic text in this field.

Sean McCauliff
+2  A: 

I use template matching to extract characters from a video stream. I need the time stamp in text format and so I examine each character in the time stamp on each frame, compare that character to templates stored in memory and the template with the best match is that number. I get 100% accuracy on clean video with slightly less on 'processed' video.

The templates are the numbers 0 to 9 and are stored in 5 x 7 arrays. The extracted number is then 'moved over' the template and a score is created. This is repeated for each template. The highest score points to the template that matches the number best.

Hope this is of use, if not, dispose of thoughtfully.

A: 

Could anybody please advise on the typical method used to recognize and separate individual characters in connected form (I mean in a word where all letters are linked together)? Forget about handwriting, supposing the letters are connected together using a known font, what is the best method to determine each individual character in a word?

Maysam