tags:

views:

2923

answers:

7

Hey,

This is primarily just curiosity but are there any pure java OCR implementations? I'm curious how this would perform purely in java and OCR in general interests me so I'd love to see how it's implemented in a language I thoroughly understand (java). Naturally this would would require that the implementation is open source ... but I'm still interested in proprietary solutions as I could at least check out the performance in that case.

I've seen a couple which can be used in java (aspire etc) but it doesn't seem that these are pure java implementations... are there any?

Thanks!

+1  A: 

A Google search for "java ocr" turned up a link to JavaWhat - Java OCR Libraries, including Asprise, GOCR, JavaOCR and Tesseract OCR.

Jim Garrison
Thx but as far as I can tell none of those are java.I did do a google search before posting a question :/
Should clarify they have java api's but are not native-java, unless I missed one.
It is always helpful to list what you've already researched in your question :-)
Jim Garrison
+2  A: 

Just found this one (don't know it, not tested, check yourself)

Ron Cemer Java OCR


As you only need this for curiosity you could look into the source of this applet.

It does OCR of handwritten characters with a neuronal network

Java OCR: Handwriting Recognition

jitter
A: 

I don't think any public java OCR libraries exist.

Anyway if one did exist, it's would be very difficult to understand what it is doing just by the code alone. These systems are usually built on top of machine learning techniques, such as neural networks or Hidden Markov Models.

What is often needed to work on such projects is a background in mathematics and computer science which includes understanding advanced (post-Calculus) statistics and probability.

joemoe
Such projects _do_ exist ......
Thorbjørn Ravn Andersen
+1  A: 

There are a variety of OCR libraries out there. However, my experience is that the major commercial implementations, ABBYY, Omnipage, and ReadIris, far outdo the open-source or other minor implementations. These commercial libraries are not primarily designed to work wiuth Java, though of course it is possible.

Of course, if your interest is to learn the code, the open-source implementations will do the trick.

Joshua Fox
+1  A: 

If you are looking for a very extensible option or have a specific problem domain you could consider rolling your own using the Java Object Oriented Neural Engine.

I used it successfully in a personal project to identify the letter from an image such as this, you can find all the source for the OCR component of my application on github, here.

Dave Tapley
A: 

I recommend trying the Java OCR project on sourceforge.net. I originally developed it, and I have a blog posting on it here: http://www.roncemer.com/software-development/java-ocr

Since I put it up on sourceforge, its functionality been expanded and improved quite a bit through the great work of a volunteer researcher/developer.

Give it a try, and if you don't like it, you can always improve it!

Ron
+1  A: 

Don't go with Asprise!!! I just tested for my needs, and, definitely, it is not reliable. The source image was:

alt text

This is a pure image (not scanned, no noise, with basic courier font).

And the results were:

    fin l Rcsults    
(%rtd a-z)
d t   16-09-l010 ti%t  30 ,37 l7
Ti%
Adrian   (PIC)l3l    25 .lO.7
Bru     037    l  .37. 
C%l    052    21;2l.l

Hopefully they have a full working trial version so you won't waste your money before realizing it's useless.

I will try Ron's Java OCR now...

Leo Holanda