What are some popular OCR algorithms? | ansaurus

tags:

views:

6363

answers:

3

+11 Q:

What are some popular OCR algorithms?

I've been interested in machine learning and computer vision for a while, so I've decided to attempt to build a simple Optical Character Recognition demo in C#.

I'm looking for a description of some common OCR algorithms and how I would go about implementing them in C#. It's a learning exercise so I'm not looking for an OCR library.

Any information would be appreciated, thanks.

+3 A:

I've been interested cracking captchas (though I haven't had time to start writing anything yet). These some bookmarks I was planning on starting with:

hypoxide 2009-05-12 01:20:18

+9 A:

OCR is a very broad field that includes things like image normalization (histogram equalization, color removal), feature extraction (textures, line segments, edge detection), and pattern classification / machine learning (neural networks, support vector machines, etc). You'll probably need to implement at least some sort of each of the above (normalize, extract features, do machine learning).

It sounds like you want to play around, write some algo's and learn about OCR. If that's the case, there's a wide literature on the subject that you can get access to if you have access to academic Journals (if not, go to the nearest University and spend a day making photocopies or printing things out).

This is a decent (if dated) survey:

Optical character recognition--a survey. Impedovo, S | Ottaviano, L | Occhinegro, S INT. J. PATTERN RECOG. ARTIF. INTELL. Vol. 5, no. 1-2, pp. 1-24. 1991

And IEEE PAMI would be good places to start.

Or, a google scholar search turns up quite a lot: http://scholar.google.com/scholar?hl=en&lr=&safe=off&client=firefox-a&q=optical+character+recognition&btnG=Search

You might also look in Duda, Hart, and Stork under "Tangent Distance" for a good example of a distance metric that is (was?) used in digit recognition.

You're going to need some data to play with, so instead of writing the numbers 0-9 4000 times each and scanning them, there's a UCI data set with all the numbers:

http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

For a quick first start, try hist-eq'ing the images, then calculate tangent distances and do some clustering, then train up a simple pattern classification algo on the resulting features.

Enjoy.

Pete 2009-05-12 01:31:20

Like Lance said below, you can simplify mu suggested quick start using K-NN over the tangent-distance metric:1) hist-eq the images2) Calculate tangent distances3) K-NN Classification

Pete 2009-05-12 01:38:45

+4 A:

This looks interesting: Basic OCR in OpenCV, using a K-nearest-neighbor algorithm for classification.

Lance Richardson 2009-05-12 01:32:37

Good call on the K-NN for classification. Simple to implement, simple to understand, and should work well for digits.

Pete 2009-05-12 01:37:37

related questions

Displaying Flash content in a C# WinForms application

How to get the value of built, encoded ViewState?

Unhandled Exception Handler in .NET 1.1

How do I connect to a database and loop over a recordset in C#?

How do I most elegantly express left join with aggregate SQL as LINQ query

Get a new object instance from a Type in C#

.NET Testing Framework Advice

Automatically update version number

What is the difference between an int and an Integer in Java/C#?

How to write to Web.Config in Medium Trust ?

WinForms ComboBox data binding gotcha

How do you sort a C# dictionary by value?

Adding Scripting functionality to .NET applications

Floating Point Number parsing: Is there a Catch All algorithm?

How do I print an HTML document from a web service?

Decoding T-SQL CAST in C#/VB.net

Anatomy of a "Memory Leak"

How do I get a distinct, ordered list of names from a DataTable using Linq

Reliable Timer in a Console Application

How do I fill a DataSet or a DataTable from a LINQ query resultset ?

What's the difference between Math.Floor() and Math.Truncate() in .NET?

How do I calculate relative time?

How do I calculate someone's age in C#?

Are there any conversion tools for porting Visual J# code to C#?

When setting a form's opacity should I use a decimal or double?