ocr

Batch OCRing PDFs that haven't already been OCR'd.

If I have 10,000 PDFs, some of which have been OCRed, some of which have 1 page that has been OCRed but the rest of the pages have not, how can I go through all the PDFs and only OCR the pages that haven't already been done? ...

Compiling tesseract-ocr on ARM/Gumstix?

Is it possible to compile tesseract-ocr for the Intel PXA270 found in certain Gumstix boards? Has anyone done this successfully, and if so, how did you do so? ...

Is OCR a solved problem?

According to Wikipedia, "The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem on applications where clear imaging is available such as scanning of printed documents." However, it gives no citation. My question is: is this true? Is the current state-of-the-art so good that - for a good sca...

Finding a word's frame (position and size) on the screen using Cocoa or Carbon

Here's a tough one: I need to be able to find a word's position and size (its frame) on the screen (its first occurence is enough, from there I should be able to get the next ones). For example, I would like to be able to detect word positions in (but not limited to) Word, Excel and PowerPoint for Mac, as well as Safari and others. Th...

"OCR" for a graph - scraping sample values from a plot image

This isn't really OCR, since it's not recognizing characters, but it's the same idea. Anyone know of an image-processing library or established algorithm for retrieving the values from a (raster) plot image? For instance, in this graph, it's hard for me to read exact values with my eyes because there's such gaps between gridlines: I...

How to detect a word in an Image

Hi, I need to find out the word in an image where user has clicked. So far i have succeeded in OCRing the image. I have a picturebox control in my c# app. user can draw a box around any text and drag it to a textbox to fill the textbox with it. I have completed this. But now i have a new requirement saying user can select a textbox and ...

Python OCR library or handwritten character recognition engine

Could you recommend some python libraries or source code for OCR and handwritten character recognition? ...

Fuzzy Text Search: Regex Wildcard Search Generator?

I'm wondering if there is some kind of way to do fuzzy string matching in PHP. Looking for a word in a long string, finding a potential match even if its mis-spelled; something that would find it if it was off by one character due to an OCR error. I was thinking a regex generator might be able to do it. So given an input of "crazy" it w...

Mobile OCR Engine for iPhone app

Hello, I am developing an app in which I have to make use of an OCR Engine can you please help me choose the best one in this regards. I have to extract text from images. I heard of abby. Is it the best ????? Suggest if some other choice is there Thnx in advance ...

Reliably extracting identity fields from scanned documents / images?

I have to pull two pre-printed (not hand-written) fields out of a paper form, such that it can be automatically routed after being scanned. The fields contain batch and item identifiers, like "GG-9192" or "EPN/245G". I've tried the following software: Tesseract-OCR Cuneiform Canon ImageRunner built-in OCR Asprise OCR Java API (demo)...

Python Tesseract OCR question

I have this image: I want to read it to a string using python, which I didn't think would be that hard. I came upon tesseract, and then a wrapper for python scripts using tesseract. So I started reading images, and it's done great until I tried to read this one. Am i going to have to train it to read that specific font? Any ideas on ...

Good opensource OCR in C#

Hi, Is there a good open source OCR implementation in C#? I am trying to solve the following problem. I have a document which contains boxes and people enter their id number in the box. Now I want to figure out the id number in program. Thank you, Bala ...

Tesseract OCR in C#

Just wondering if anyone has got a sample project or compliled dll of the tesseract ocr engine running in C#? I have tried going through the tessnet2 demo (here) but for some reason, I can't install the C++ stuff in my current VS2008 installation so can't build it. Thanks! ...

Java OCR implementation

Hey, This is primarily just curiosity but are there any pure java OCR implementations? I'm curious how this would perform purely in java and OCR in general interests me so I'd love to see how it's implemented in a language I thoroughly understand (java). Naturally this would would require that the implementation is open source ... but I...

Parsing amount strings into numbers

I am working on a system that is recognizing paper documents using OCR engines. These documents are invoices containing amounts such as total, vat and net amounts. I need to parse these amount strings into numbers, but they are coming in many formats and flavors using different symbols for decimal and thousands separation in the number i...

Fraktur recognition with OCRopus/Tesseract on Linux

I am trying to perform recognition of a german text with fraktur typeface with ocropus but It doesn't seem to be using deu-f package. Here are the steps I performed. Compiled and installed tesseract and ocropus. Downloaded http://tesseract-ocr.googlecode.com/files/tesseract-2.01.deu-f.tar.gz, unpacked it to tessdata/. But when I cal...

Using Ruby And Ubuntu With Optical Character Recognition

I am a university student and it's time to buy textbooks again. This quarter there are over 20 books I need for classes. Normally this wouldn't be such a big deal, as I would just copy and paste the ISBNs into Amazon. The ISBNs, however, are converted into an image on my school's book site. All I want to do is get the ISBNs into a string...

Need good OCR for printed source code listing, any ideas?

At my work, I sometimes have to take some printed source code and manually type the source code into a text editor. Do not ask why. Obviously typing it up takes a long time and always extra time to debug typing errors (oops missed a "$" sign there). I decided to try some OCR solutions like: Microsoft Document Imaging - has built in O...

Any open source /free OCR(Pattern recognition) software? (for mobile platforms?)

Hi, I want to extract the text information (Chinese) from the images that picked by users with their mobiles. So I am here wondering is there any open source/free OCR (Pattern recognition) software for mobile platform. Currently I am doing with iPhone (And android, blackberry platform?) I've searched stackoverflow but seems there only ...

Open source OCR for Chinese

I've searched around for open source OCR for Chinese. But without any luck there rarely seems to be some open source OCR (for Chinese) that are usable. So I am here wondering: Is there any open source OCR for Chinese that could be used for production environment? What's the main differences when implementing an OCR for Latin-languages...