views:

143

answers:

3

There is this great open-source OCR component that Google has been developing: http://code.google.com/p/tesseract-ocr/

They have a new version out (version 3) at the beginning of October 2010.

But this new version no longer has a working C wrapper, and it's up to somebody in the Delphi community to get it to work from inside Delphi -- I'm trying to do it because I badly need it and nobody else is in a hurry to do it but I have no idea what I'm doing when it comes to DLLs and converting C to Delphi. That's where I could use your help.

The clues I have picked up are that I need Dependency Walker to somehow prevent 'name-mangling' (no idea what that means). The actual DLL API methods are in the C files - and presumably the DLL function-names you see in Dependency-Walker will match the functions in the API file.

Here's everything you'll need to help: You will need a folder with the tessdll.dll in it and also leptonlib.dll just needs to be there. You'll need a subfolder called 'tessdata', and inside the folder will be your 'language data files' - [check the downloads page on the site]

Here's the Windows installer just so you can see the DLL in action: [check the downloads page on the site]

To get this working for Delphi, you'll have your executable in the same folder as the DLL. You'll then need to know what to call in the DLL, and for that you can look in the C Source files: [check the source files on the downloads page on the site]

Thanks for any assistance.

+6  A: 

From the first look this could be difficult. Since the API is appearantly encapsulated in a C++ class the only clean way to do it would be:

Implement a wrapper DLL in C that exposes a flattented interface of the class so that you can write a Delphi unit to use it.

The principle is outlined here:

http://rvelthuis.de/articles/articles-cppobjs.html

Directly using the C++ API would require some clever assembler hacking. It's not only the name mangling that is a problem here, but also the calling convention of the C++ compiler that was used to create the DLL (which is Visual Studio 2008 Express).

So somebody has to write a DLL with a C API using Visual C++ 2008 Express first.

Some clarification concering your comments:

When you want to use an external library in your application you need to know what symbols you need to import.

A normal symbol would be 'SetDllDirectory' in kernel32.dll. No problem to import that in Delphi, but C++ normally uses a more contrived way to name its symbols. An example would be '_ZN·9wikipedia·7article·6format·E' (taken from this article: http://en.wikipedia.org/wiki/Name_mangling)

While it is possible to import a mangled symbol that's only a minor part of the problem.

You can tell the C++ compiler to not use name mangling using the extern "C" { directive.

There are still at least two additional problems:

  • You do not have a method to determine the size of a C++ object instance from Delphi
  • All methods of a C++ object take a hidden this argument (like Self in Delphi)

These problems can be circumvented by writing a wrapper like explained in Rudy's article.

You have to write a simple C++ Dll that exports a normal C API (without mangling and with normal C functions), in pseudo-code it looks like that:

extern "C" {

void* MakeAnInstanceOfDesiredClass(void)
{
    return new DesiredClass();
}

void DestroyInstanceOfDesiredClass(void* instance)
{
    delete instance;
}

int SomeMethodOfDesiredClass(void* instance)
{
    return reinterpret_cast<DesiredClass*>(instance)->SomeMethod();
}

}

I would give it a try, but my internet connection is quite slow and I don't have Visual Studio here, sorry.

Jens Mühlenhoff
Thank you for that answer, it does shed some light. However, I'm still rather confused. I've downloaded Visual C++ 2010 because I'm desperate to get version 3 - so I might as well try do it myself - but I'm completely clueless about this and still very confused. Like, is the source code in what, C, C++? Can I compile both language from Visual C++ 2010? There's a "Windows DLL" version of the DLL that you can download - surely that's all Delphi needs? ... So from what you're saying, I have no chance of getting this right myself as a humble Delphi programmer.
Richard Woolf
Also, what is 'name mangling' ?
Richard Woolf
Dlls were never designed for exporting object methods. Name mangling works around this by using a specially formatted function name. The details of the mangling technique varies from one language to another (and even one compiler manufacturer to another) Delphi uses name mangling in its bpl format (which is a dll under the hood).
codeelegance
Mangling is used in static libraries as well.
codeelegance
+2  A: 

Actually after taking a closer look at the documentation there might be a subset of function that are still C API and thus accessible directly from Delphi:

BOOL APIENTRY  DllMain (HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved) 
TESSDLL_API void __cdecl  TessDllRelease () 
TESSDLL_API void *__cdecl  TessDllInit (const char *lang) 
TESSDLL_API int __cdecl  TessDllBeginPageBPP (uinT32 xsize, uinT32 ysize, unsigned char *buf, uinT8 bpp) 
TESSDLL_API int __cdecl  TessDllBeginPageLangBPP (uinT32 xsize, uinT32 ysize, unsigned char *buf, const char *lang, uinT8 bpp) 
TESSDLL_API int __cdecl  TessDllBeginPageUprightBPP (uinT32 xsize, uinT32 ysize, unsigned char *buf, const char *lang, uinT8 bpp) 
TESSDLL_API int __cdecl  TessDllBeginPage (uinT32 xsize, uinT32 ysize, unsigned char *buf) 
TESSDLL_API int __cdecl  TessDllBeginPageLang (uinT32 xsize, uinT32 ysize, unsigned char *buf, const char *lang) 
TESSDLL_API int __cdecl  TessDllBeginPageUpright (uinT32 xsize, uinT32 ysize, unsigned char *buf, const char *lang) 
TESSDLL_API void __cdecl  TessDllEndPage (void) 
TESSDLL_API ETEXT_DESC *__cdecl  TessDllRecognize_a_Block (uinT32 left, uinT32 right, uinT32 top, uinT32 bottom) 
TESSDLL_API ETEXT_DESC *__cdecl  TessDllRecognize_all_Words (void) 
TESSDLL_API void __cdecl  ReleaseRecognize () 
TESSDLL_API void *__cdecl  InitRecognize () 
TESSDLL_API int __cdecl  CreateRecognize (uinT32 xsize, uinT32 ysize, unsigned char *buf) 
TESSDLL_API ETEXT_DESC *__cdecl  reconize_a_word (uinT32 left, uinT32 right, uinT32 top, uinT32 bottom) 

I don't know if these functions are enough, but they are directly accessible.

Jens Mühlenhoff
actually I believe those functions are for the wrapper for the last version 2.04 and are depricated in this new version 3.0. there is an issue opened on the issues page of the site saying this, and that a new C wrapper must be written for version 3.0.
Richard Woolf