views:

6370

answers:

6
+27  Q: 

Free OCR library

Does anyone know of a good free or cheap (under £100/$200) OCR library? It needs to run on Windows and preferably be a .NET library, though a COM interface is fine.

+6  A: 

maybe not exactly what you are looking for, but this might point you to the right direction.

The code below is a unmodified copy/paste from the source. (only given to let the readers easily find the essence of this solution in one place)
Original Author: Martin Welker

_MODIDocument = new MODI.Document(); 
_MODIDocument.Create(filename);

// The MODI call for OCR
_MODIDocument.OCR(_MODIParameters.Language,
_MODIParameters.WithAutoRotation,
_MODIParameters.WithStraightenImage);


// add event handler for progress visualisation
_MODIDocument.OnOCRProgress +=
new MODI._IDocumentEvents_OnOCRProgressEventHandler(this.ShowProgress);
public void ShowProgress(int progress, ref bool cancel)
{
statusBar1.Text = progress.ToString() + "% processed.";
}
axMiDocView1.Document = _MODIDocument;

private void Statistic()
{
// iterating through the document's structure doing some statistics.
string statistic = "";
for (int i = 0 ; i < _MODIDocument.Images.Count; i++)
{
int numOfCharacters = 0;
int charactersHeights = 0;
MODI.Image image = (MODI.Image)_MODIDocument.Images[i];
MODI.Layout layout = image.Layout;
// getting the page's words

for (int j= 0; j< layout.Words.Count; j++)
{
MODI.Word word = (MODI.Word) layout.Words[j];
// getting the word's characters

for (int k = 0; k < word.Rects.Count; k++)
{
MODI.MiRect rect = (MODI.MiRect) word.Rects[k];
charactersHeights += rect.Bottom-rect.Top;
numOfCharacters++;
}
}
float avHeight = (float )charactersHeights/numOfCharacters;
statistic += "Page "+i+ ": Avarage character height is: "+
avHeight.ToString(" 0.00 ") +" pixel!"+ "\r\n";
}
MessageBox.Show("Document Statistic:\r\n"+statistic);
}

// initialize MODI search
MODI.MiDocSearchClass search = new MODI.MiDocSearchClass();
search.Initialize(
_MODIDocument,
_DialogSearch.Properties.Pattern,
ref PageNum,
ref WordIndex,
ref StartAfterIndex,
ref Backward,
MatchMinus,
MatchFullHalfWidthForm,
MatchHiraganaKatakana,
IgnoreSpace);

MODI.IMiSelectableItem SelectableItem = null;

// the one and only search call
search.Search(null,ref SelectableItem);

It uses the Microsoft Office Document Imaging Library from office 2003 to provide the OCR functionality for your application (need to add a reference to MDIVWCTL.DLL).

Sven
+8  A: 

as Jon Galloway describes the Microsoft Office Document Imaging libraries included with Microsoft Office are available on many computers, and easy to automate with .net.

Jon lists a few others in his article.

Leon Bambrick
Hey, I was gonna post all that stuff! I want some karma points off your answer, Leon! :-)
Jon Galloway
+3  A: 

Google open sourced an OCR engine called tessaract. I am not sure though how is to check that out...

gyurisc
+5  A: 

tessnet (http://www.pixel-technology.com/freeware/tessnet2/) is an open-source .NET OCR engine based on tesseract

Mauricio Scheffer
Unfortunately it requires Visual C++ 2005 Runtime which means yet another bootstrapper for installs.
chaiguy
A: 

The best OCR engine is tesseract. You can check how it works in this online OCR tool.

Alex
A: 

If you're ok with using an external, web-based API to do the OCR, take a look at http://www.wisetrend.com/wisetrend_ocr_cloud.shtml

Sample code in .NET (C#) to use this: http://snipt.org/lOgh/

Charges are per-page, with a free trial available

Eugene Osovetsky
Seems pretty spammy, specially when the question asks for a *free* OCR library.
Kobi
In the body of the question it specifically asks for a "free or cheap (under £100/$200) OCR library", implying that commercial products are acceptable. For many projects, the web-based approach can be cheaper than paying for an OCR library license (especially when you take recognition quality into account, and especially if you can stay within the rather-generous free trial limits of the API). I truly believe that my answer adds value both for the original poster and to others who may be in a similar situation.
Eugene Osovetsky