



Does anybody have any idea about any recent work being done on optical character recognition for Indian scripts using modern Machine Learning techniques ? I know of some research being done at ISI, calcutta, but nothing new has come up in the last 3-4 years to the best of my knowledge, and OCR for Devanagari is sadly lacking!


The best in this field was created before 2005, but it is not perfect. There is a Sanskrit OCR on The software was developed by Oliver Hellwig. One can download it for free. You may wait ABBYY Fine Reader Indian Languages support. Only they can to make it.

+1  A: 

This is surely too old to be useful, but is cool: a video of the Ingalls speaking on Sanskrit and OCR. (Daniel H. H. Ingalls, Sr., Sanskrit professor and translator, and his son Dan Ingalls, computer scientist involved with Smalltalk etc.) The first half is Ingalls Sr. describing a project to automatically analyze text, and the second is by Ingalls Jr. describing how he implemented OCR for Sanskrit from scratch.

+2  A: 

FYI: There's an article in the New York Times from 2003 referencing a tool called ILT.

Simeon Fitch

I just came across this question. I have been developing a specialized HindiOCR, which has recognition rates of about 99.5% on good documents. Hope to make this program available until the end of the year.

Oliver Hellwig
do you have any related publications

Hi Oliver,

Thanks for your great work. I have tested your Sanskrit OCR and it works great. I am (and many thousands were) waiting for Hindi OCR since ages, and am happy to know that you are going to release it soon. I may help you in testing if you like :)


Thanks, Oliver! I look forward to a HindiOCR. I may help you in testing too:)


Any update on Hindi OCR?

We were able to get almost state of the art results. It wasn't a product for development, but an classroom exercise which got pretty well. I can provide the final paper if somebody wishes to look through it.

We were able to get almost state of the art results. It wasn't a product for development, but an classroom exercise which got pretty well. I can provide the final paper if somebody wishes to look through it. – Egon Oct 3 at 1:01

Egon, I am very interested to test your product. Are you recognize good or bad (half handwritten) quality Sanskrit texts?

I can't release the source code,etc. But here is the final report,

Ann, thanks! I have read the report. You've worked with good texts as I understand. You should get a good program. I am interested in the question of simplification of work with Sanskrit texts from primitive Indian typography. Finereader or Oliver Hellvig software training is time consuming and inefficient in this case This work may not be fully automated yet. I have an idea how to significantly minimize the cost of human power in the recognition of texts from the primitive typography. It may be as addition to any modern OCR (your OCR too). I'm developing a program that will illustrate my ideas now.
