views:

238

answers:

2

I want my application to fill in a single field in a form that exists as an black-and-white image file. The form always starts as the same paper version, but by the time my application gets it from my users, it may have been scanned or faxed more than once. Because of that, the field I need is not in the same place in every file.

My users do not always get the blank form from me, so I do not have the ability to print a mark or placeholder that I can recognize later.

There is text on the original blank form, but because it may have been faxed, I have only 200 dpi of resolution. The text is always big enough for a human to read, but I'm skeptical about OCR.

I have some budget so I do not need a free solution ... let's just say $2000.

That said, I am considering

  1. Get an OCR solution to find the text label on the field I need. I do not think I have the resources or expertise to roll-my-own. I do not need perfect recognition, since I already know what the text says. But I do need to know X- and Y-coordinates. Is there software that does this? Or is the programming easier than I think?

  2. Build or buy software to recognize the edges of the form. From there, I could get the relative position of the field I need. I'm thinking of the dashed line my scanner software puts around the image of a small document. Is that a known algorhthm or is there an available solution?

  3. Some other way to recognize the field I need. Attempts to google form filling software give me hundreds of matches for web forms, pdf forms, etc. that do not do what I need.

I'm not picky about language. My application runs on Linux, but if the best solution is Microsoft, I can probably make that work.

I'd appreciate your thoughts.

A: 

Here's a little summary of some available OCR solutions (open source and not): http://googlesystem.blogspot.com/2007/04/open-source-ocr-software-sponsored-by.html

ChrisW
The solutions summarized there will turn images into text. None say they will tell me the location on the page of the recognized text. Do you have experience with one of these that will do that?
bmb
No, I'm sorry to tell you that I have virtually no experience with OCR. I mentioned that link because, when I read it, it was news to me that there's any open-source OCR and/or that Google has a part in it.
ChrisW
+2  A: 

If I understand correctly, the form is always the same, but may be shifted, scaled, or slightly rotated due to photocopying/faxing. In that case, your problem is one of image registration: find the optimal rigid transformation that makes a form from a user line up with your "model" form, in which you know the location of the field of interest. Once you know the transformation, you can compute the location of the field in the user's form.

There are many image registration algorithms, typically developed for applications such as aligning MR-images of the brain. They are computationally expensive and require statistical priors. Fortunately, your case is easier: all you need to do is fit a rectangle around the contents of the user's form. Coordinate descent should work. You will need some tolerance for noise (junk outside the form).

Vebjorn Ljosa
I think you understand exactly correctly. This is great information.
bmb
@Vebjorn, Can you explain what you are referring as Coordinate descent? Thanks
Raj
Vebjorn Ljosa