I need to invoke tesseract OCR (its an open source library in C++ that does Optical Character Recognition) from a Java Application Server. Right now its easy enough to run the executable using Runtime.exec(). The basic logic would be
- Save image that is currently held in memory to file (a .tif)
- pass in the image file name to the tesseract command line program.
- read in the output text file from Java using FileReader.
How much improvement in terms of performance am I likely to get by writing a JNI wrapper for Tesseract? Unfortunately there is not an open source JNI wrapper that works in Linux. I would have to do it myself and am wondering about whether the benefit is worth the development cost.