OCR: How to compare images, sort unmatching out and do this fast?

In the OCR-world it's pretty seldom that you run into a "perfect match" between a targetresource and your original comparing resource.

Actually it's a huge field of science, but here's a nice thesis on the subject which should give you some basic knowledge: http://www.discover.uottawa.ca/~qchen/my_papers/master_thesis.pdf

Note that algorithms like these are very math heavy and in now way optimized for a standard x86 CPU.

If you are looking for a perfect match (I mean, really perfect, down to byte-to-byte) and you want to implement this fast and easy, I'd suggest doing a "skip the obvious mismatches fast"-kinda algorithm - something like:

1) Compare size of arrays, if different, it's not what you look for

2) Compare a hash-value of each bitmaps

3) Compare each bit / byte one-by-one and as soon as you see a difference, it's not what you look for

4) Win, you found a match :)

This is very slow, depending on what you're trying to achieve, but easy to implement and it will work. So goes well for a prototype-alike application. As I said, OCR (and all other forms of digital signal processing) are a huge field of research, so it's not something you can expect people to teach you in a quick forumpost, sadly :(

Good luck

[EDIT] Looking at the comment in your OQ, I'll say going for a hashtable / dictionary datastructure would be the fastest for you. That, or a binary search tree.. Both very reliant onj your hash-key generator :)

[EDIT2 (xD)] "It's aliased text generated by a computer. The Background is different, but the text always has the same color." Pretty important information there :P Are the size of the text / bitmaps always the same as well? I'd suggest that either implement your own hashing algorithm where you discard the preset background colors, so that the hashing value only depends on the color of the text (and the shape of this too ofc) or simply rewrites all background pixels in your targets to be the same color as your original (or just set the original background to that of your targets? Depends again on which data you are fighting with here - need more information :) ).

ansaurus

tags:

views:

answers:

OCR: How to compare images, sort unmatching out and do this fast?

related questions