+27  Q: 

Free OCR library

Does anyone know of a good free or cheap (under £100/$200) OCR library? It needs to run on Windows and preferably be a .NET library, though a COM interface is fine.

+6  A: 

maybe not exactly what you are looking for, but this might point you to the right direction.

The code below is a unmodified copy/paste from the source. (only given to let the readers easily find the essence of this solution in one place)
Original Author: Martin Welker

_MODIDocument = new MODI.Document(); 

// The MODI call for OCR

// add event handler for progress visualisation
_MODIDocument.OnOCRProgress +=
new MODI._IDocumentEvents_OnOCRProgressEventHandler(this.ShowProgress);
public void ShowProgress(int progress, ref bool cancel)
statusBar1.Text = progress.ToString() + "% processed.";
axMiDocView1.Document = _MODIDocument;

private void Statistic()
// iterating through the document's structure doing some statistics.
string statistic = "";
for (int i = 0 ; i < _MODIDocument.Images.Count; i++)
int numOfCharacters = 0;
int charactersHeights = 0;
MODI.Image image = (MODI.Image)_MODIDocument.Images[i];
MODI.Layout layout = image.Layout;
// getting the page's words

for (int j= 0; j< layout.Words.Count; j++)
MODI.Word word = (MODI.Word) layout.Words[j];
// getting the word's characters

for (int k = 0; k < word.Rects.Count; k++)
MODI.MiRect rect = (MODI.MiRect) word.Rects[k];
charactersHeights += rect.Bottom-rect.Top;
float avHeight = (float )charactersHeights/numOfCharacters;
statistic += "Page "+i+ ": Avarage character height is: "+
avHeight.ToString(" 0.00 ") +" pixel!"+ "\r\n";
MessageBox.Show("Document Statistic:\r\n"+statistic);

// initialize MODI search
MODI.MiDocSearchClass search = new MODI.MiDocSearchClass();
ref PageNum,
ref WordIndex,
ref StartAfterIndex,
ref Backward,

MODI.IMiSelectableItem SelectableItem = null;

// the one and only search call
search.Search(null,ref SelectableItem);

It uses the Microsoft Office Document Imaging Library from office 2003 to provide the OCR functionality for your application (need to add a reference to MDIVWCTL.DLL).

+8  A: 

as Jon Galloway describes the Microsoft Office Document Imaging libraries included with Microsoft Office are available on many computers, and easy to automate with .net.

Jon lists a few others in his article.

Leon Bambrick
Hey, I was gonna post all that stuff! I want some karma points off your answer, Leon! :-)
Jon Galloway
+3  A: 

Google open sourced an OCR engine called tessaract. I am not sure though how is to check that out...

+5  A: 

tessnet ( is an open-source .NET OCR engine based on tesseract

Mauricio Scheffer
Unfortunately it requires Visual C++ 2005 Runtime which means yet another bootstrapper for installs.

The best OCR engine is tesseract. You can check how it works in this online OCR tool.


If you're ok with using an external, web-based API to do the OCR, take a look at

Sample code in .NET (C#) to use this:

Charges are per-page, with a free trial available

Eugene Osovetsky
Seems pretty spammy, specially when the question asks for a *free* OCR library.
In the body of the question it specifically asks for a "free or cheap (under £100/$200) OCR library", implying that commercial products are acceptable. For many projects, the web-based approach can be cheaper than paying for an OCR library license (especially when you take recognition quality into account, and especially if you can stay within the rather-generous free trial limits of the API). I truly believe that my answer adds value both for the original poster and to others who may be in a similar situation.
Eugene Osovetsky