I want my application to fill in a single field in a form that exists as an black-and-white image file. The form always starts as the same paper version, but by the time my application gets it from my users, it may have been scanned or faxed more than once. Because of that, the field I need is not in the same place in every file.
My users do not always get the blank form from me, so I do not have the ability to print a mark or placeholder that I can recognize later.
There is text on the original blank form, but because it may have been faxed, I have only 200 dpi of resolution. The text is always big enough for a human to read, but I'm skeptical about OCR.
I have some budget so I do not need a free solution ... let's just say $2000.
That said, I am considering
Get an OCR solution to find the text label on the field I need. I do not think I have the resources or expertise to roll-my-own. I do not need perfect recognition, since I already know what the text says. But I do need to know X- and Y-coordinates. Is there software that does this? Or is the programming easier than I think?
Build or buy software to recognize the edges of the form. From there, I could get the relative position of the field I need. I'm thinking of the dashed line my scanner software puts around the image of a small document. Is that a known algorhthm or is there an available solution?
Some other way to recognize the field I need. Attempts to google form filling software give me hundreds of matches for web forms, pdf forms, etc. that do not do what I need.
I'm not picky about language. My application runs on Linux, but if the best solution is Microsoft, I can probably make that work.
I'd appreciate your thoughts.