views:

42

answers:

2

Hi, Iam using PB 10.5.2 and EZTwain 3.30.0.28, XDefs 1.36b1 by Dosadi for scanning.

Also Iam using the TOCR 3.0 for OCR management.

In a function we use the following among all others :

...

Long ll_acquire

(as_path_filename is a function argument)

...

...

TWAIN_SetAutoOCR(1)

ll_acquire = TWAIN_AcquireMultipageFile(0, as_path_filename) 

the problem is that the scanned pdf page has latin (english) and greek words. The English characters are searched quite precisely but the greek don't at all.

Do you think this that this has to do with the TOCR software. I just want to search AND for greek words

Thanks in advance

+1  A: 

The OCR software should be where it is failing to convert the Greek words into OCR'd text. It looks like you are using EZTwain for the OCR portion which uses TOCR for its actual OCR engine. You may want to look at the docs for that software and see if they mention any settings that can be modified for multilingual usage.

Dougman
+1  A: 

According to the website TOCR recognizes English, French, Italian, German, Dutch, Swedish, Finnish, Norwegian, Danish, Spanish and Portuguese. You'll need software that can handle mixed Greek and English text. ABBYY FineReader Professional lists support for English and Greek, along with dozens of others.

Hugh Brackett
By the way, there's an online, pay-per-page API powered by the ABBYY engine, with multilingual support: http://www.wisetrend.com/wisetrend_ocr_cloud.shtml
Eugene Osovetsky