views:

52

answers:

2

Is there an already-existing piece of commercial or academic software that can

  • overlay results from multiple OCR packages (Abbyy FineReader, Adobe Acrobat Professional, ReadIris, etc.)
  • provide fully automated improvements based on accumulated knowledge from multiple sources
  • allow for use of additional external tools setup at runtime (dictionieres, batch web / local corpus look-ups etc.)

?

Note: I already have in-house solutions to visualize results from single sources, so in case there is no such software obtainable, I would not mind developing my own : ) Inquiries for cooperation would then also be most welcome! screnshot

+1  A: 

Hi there.

The idea to use voting between several OCR engines is not new. The thing is that it is not really working. What probably would work if they would be simple classifiers ortogonal by thier nature, then you would combine their votes and improve results. But they all are very complicated software, using quite similar set of well-known approches with little variances, but probably combining them different way and some implementations are better and some are worse.

Experience shows that when you combine several OCR technologies, the best decision rule is to rely on results of most accurate one and just ingore others. From my experience, ABBYY OCR is definetely the most accurate from ones you mentioned.

As far as I know, the only reason to use voting is when you want cross-check "suspicious" characters and send them to manual verification if 100% accuracy is a requirement. Using this approach you increase number of characters to verify, but reduce possibility to miss wrong character.

Best regards, Andrey

Tomato
@Andrey: "From my experience, ABBYY OCR is definetely the most accurate from ones you mentioned."Is there any that I have not mentioned that are more accurate?
Cetin Sert
I would say OmniPage is not bad, close to ABBYY in accuracy and little bit faster. But if accuracy is priority, I would definetely choose ABBYY
Tomato
A: 

There are two options that I have worked with previously and would recommend.

  1. PrimeOCR. http://www.primerecognition.com/

It is a commercial offering that uses multiple OCR engines and voting to determine the best result. It is machine print only. Last time I used it they had 6 engines. Contact Alex Dahl.

I have used it in a major project scanning 20,000+ pages per day.

  1. RecoStar from OpenText.

RecoStar uses voting and can do handprint and machineprint.

Andrew Cash