tags:

views:

425

answers:

3

Hi all,

i'm using the tessnet2 wrapper to the Tesseract 2.04 Source on windows XP, configured it to work with x86.

TessarctTest project main function contains:

        Bitmap bmp = new Bitmap(@"C:\temp\New Folder\dotnet\eurotext.tif");
        tessnet2.Tesseract ocr = new tessnet2.Tesseract();
        // ocr.SetVariable("tessedit_char_whitelist", "0123456789");
        ocr.Init(@"C:\temp\tessdata", "eng", false);
        // List<tessnet2.Word> r1 = ocr.DoOCR(bmp, new Rectangle(792, 247, 130, 54));
        List<tessnet2.Word> r1 = ocr.DoOCR(bmp, Rectangle.Empty);
        int lc = tessnet2.Tesseract.LineCount(r1);

when i try to run the program it crashes on the following line inside the ocr.Init

int result = m_myTessBaseAPIInstance->InitWithLanguage((char *)_tessdata.ToPointer(), NULL, (char *)_lang.ToPointer(), NULL, numericMode, 0, NULL);

Any one has an idea?

Appreciate!

+1  A: 

Project + Properties, Debug tab, scroll down, tick the "Enable unmanaged code debugging" checkbox. Now you can set a breakpoint and debug it.


If your IDE doesn't support mixed mode debugging, you can attach a debugger using the technique outlined in this post.

Hans Passant
@nobugz: That's a very important thing to mention and often overlooked. +1 from me.. :)
tommieb75
don't have it under visual studio 2008 standard edition...
Jack
Did you scroll down? I updated my post with another way to do it.
Hans Passant
+1  A: 

Make sure your tessdata folder (C:\temp\tessdata) contains the english language data files. The files are: eng.DangAmbigs, eng.freq-dawg, eng.inttemp, eng.normproto, eng.pffmtable, eng.unicharset, eng.user-words, eng.word-dawg. download the files from tesseract downloads. The file to download is tesseract-2.00.eng.tar.gz.

mcdon
A: 

For those attempting to use the Tessnet2 assembly for the Tesseract OCR engine in C# and who are running into the problem of the Tesseract.Init() method causing your app to crash - I found one possible cause.

First, I'm assuming you have the files as follows:

bin\Debug\MyDotNetApp.exe
bin\Debug\tessdata\eng.DangAmbigs
bin\Debug\tessdata\eng.freq-dawg
bin\Debug\tessdata\eng.inttemp
bin\Debug\tessdata\eng.pffmtable
bin\Debug\tessdata\eng.unicharset
bin\Debug\tessdata\eng.user-words
bin\Debug\tessdata\eeng.word-dawg

And are using this for the initialization:

using (var ocr = new tessnet2.Tesseract())
{
    ocr.Init(null, "eng", false);
    ...
}

In theory that should work. For me it did work - but then it didn't all of a sudden... even though I didn't change anything that would affect it.

For me the fix was to search through the registry (using regedit) and remove all references to tesseract. There were some suspicious entries that I think may have been created when I installed the Tesseract 3.00 installer (tesseract-ocr-setup-3.00.exe).

When I deleted those entries and rebooted (I had tried rebooting before removing the reg entries, FYI), everything worked again.

Were the registry entries causing the problem? Who knows. But it did fix my problem.

dkr88