ansaurus

Question

Free OCR library

Answer 1

+6 A:

maybe not exactly what you are looking for, but this might point you to the right direction.

The code below is a unmodified copy/paste from the source. (only given to let the readers easily find the essence of this solution in one place)
Original Author: Martin Welker

_MODIDocument = new MODI.Document(); 
_MODIDocument.Create(filename);

// The MODI call for OCR 
_MODIDocument.OCR(_MODIParameters.Language, 
                  _MODIParameters.WithAutoRotation, 
                  _MODIParameters.WithStraightenImage);


// add event handler for progress visualisation
_MODIDocument.OnOCRProgress += 
  new MODI._IDocumentEvents_OnOCRProgressEventHandler(this.ShowProgress);
public void ShowProgress(int progress, ref bool cancel)
{
    statusBar1.Text = progress.ToString() + "% processed.";
}
axMiDocView1.Document = _MODIDocument;

private void Statistic()
{    
    // iterating through the document's structure doing some statistics.
    string statistic = "";
    for (int i = 0 ; i < _MODIDocument.Images.Count; i++)
    {
        int numOfCharacters = 0;
        int charactersHeights = 0;
        MODI.Image image = (MODI.Image)_MODIDocument.Images[i];
        MODI.Layout layout = image.Layout;
        // getting the page's words

        for (int j= 0; j< layout.Words.Count; j++)
        {
            MODI.Word word = (MODI.Word) layout.Words[j];
            // getting the word's characters

            for (int k = 0; k < word.Rects.Count; k++)
            {
                MODI.MiRect rect = (MODI.MiRect) word.Rects[k];
                charactersHeights  += rect.Bottom-rect.Top;
                numOfCharacters++;                        
            }
        }
        float avHeight = (float )charactersHeights/numOfCharacters;
        statistic += "Page "+i+ ": Avarage character height is: "+
                         avHeight.ToString(" 0.00 ") +" pixel!"+ "\r\n";
    }
    MessageBox.Show("Document Statistic:\r\n"+statistic);
}

// initialize MODI search
MODI.MiDocSearchClass search = new MODI.MiDocSearchClass();
search.Initialize(
    _MODIDocument,
    _DialogSearch.Properties.Pattern,
    ref PageNum,
    ref WordIndex,
    ref StartAfterIndex,
    ref Backward,
    MatchMinus,
    MatchFullHalfWidthForm,
    MatchHiraganaKatakana,
    IgnoreSpace);

MODI.IMiSelectableItem SelectableItem = null;

// the one and only search call
search.Search(null,ref SelectableItem);

It uses the Microsoft Office Document Imaging Library from office 2003 to provide the OCR functionality for your application (need to add a reference to MDIVWCTL.DLL).

Sven 2008-08-04 10:02:59

Answer 2

+8 A:

as Jon Galloway describes the Microsoft Office Document Imaging libraries included with Microsoft Office are available on many computers, and easy to automate with .net.

Jon lists a few others in his article.

Leon Bambrick 2008-08-04 10:06:13

Hey, I was gonna post all that stuff! I want some karma points off your answer, Leon! :-)

Jon Galloway 2008-09-12 19:38:29

Answer 3

+3 A:

Google open sourced an OCR engine called tessaract. I am not sure though how is to check that out...

gyurisc 2008-08-24 15:12:39

Answer 4

+5 A:

tessnet (http://www.pixel-technology.com/freeware/tessnet2/) is an open-source .NET OCR engine based on tesseract

Mauricio Scheffer 2008-09-24 14:13:32

Unfortunately it requires Visual C++ 2005 Runtime which means yet another bootstrapper for installs.

chaiguy 2009-01-24 19:52:03

Answer 5

A:

The best OCR engine is tesseract. You can check how it works in this online OCR tool.

Alex 2009-11-15 09:12:01

Answer 6

A:

If you're ok with using an external, web-based API to do the OCR, take a look at http://www.wisetrend.com/wisetrend_ocr_cloud.shtml

Sample code in .NET (C#) to use this: http://snipt.org/lOgh/

Charges are per-page, with a free trial available

Eugene Osovetsky 2010-10-24 19:22:11

Seems pretty spammy, specially when the question asks for a *free* OCR library.

Kobi 2010-10-24 19:28:43

In the body of the question it specifically asks for a "free or cheap (under £100/$200) OCR library", implying that commercial products are acceptable. For many projects, the web-based approach can be cheaper than paying for an OCR library license (especially when you take recognition quality into account, and especially if you can stay within the rather-generous free trial limits of the API). I truly believe that my answer adds value both for the original poster and to others who may be in a similar situation.

Eugene Osovetsky 2010-10-24 19:41:13

ansaurus

tags:

views:

answers:

Free OCR library

related questions