views:

786

answers:

3

I've searched around for open source OCR for Chinese. But without any luck there rarely seems to be some open source OCR (for Chinese) that are usable.

So I am here wondering:

  1. Is there any open source OCR for Chinese that could be used for production environment?

  2. What's the main differences when implementing an OCR for Latin-languages and for Chinese? I know some good OCR such as Tesseract or Ocropus, what should I do if I want to make it support Chinese?

Any help is appreciated and thanks in advance~

A: 

Chinese has far more characters than Latin languages. There are some commercial products. One of the ways is to contact them and get help.

I don't think there is an open source for Chinese or Japanese characters. In the area of OCR, there are a lot of techniques beyond the pattern recognition algorithms, where a company is good at, not the open source community.

Yin Zhu
I did found some open source OCR for Japanese. Seems there is not many choices for Chinese. Still thank you~
Mickey Shine
A: 

You mention that you found open source japanese OCR. Can you let me know what you've found as I'm interested in the area as well. Thanks!

Dan
You can check out http://www.ocrgrid.org/ , NHocr is the name
Mickey Shine
+1  A: 

You can choose:

  • Tesseract 3.0 support chinese/japanese
  • NHOCR support japanese
Eric Liu
Is Tesseract 3.0 available for download now? Where can I find its source?
Mickey Shine
You can check out http://code.google.com/p/tesseract-ocr/source/checkout.
Eric Liu