views:

400

answers:

1

Hi, I managed to have each character stored in a bitmap and am looking for a way to quickly determine which character it is.

Therefore I'm about to store every possible character into an array of 1 and 0, and compare them to an array of the bitmap I just grabbed.

I could do simple checks like compare how many black pixels I got, compare the dimensions and so on, but all these checks are slow (just a guess..).

So what I'm looking for is a method, which goes trough every pixel from bottom to top, or randomly which compares the array to a set of arrays and sorts unmatching out, till only one array remains. But how can I implement that?

Thanks for your help.

Sven

+1  A: 

In the OCR-world it's pretty seldom that you run into a "perfect match" between a targetresource and your original comparing resource.

Actually it's a huge field of science, but here's a nice thesis on the subject which should give you some basic knowledge: http://www.discover.uottawa.ca/~qchen/my_papers/master_thesis.pdf

Note that algorithms like these are very math heavy and in now way optimized for a standard x86 CPU.

If you are looking for a perfect match (I mean, really perfect, down to byte-to-byte) and you want to implement this fast and easy, I'd suggest doing a "skip the obvious mismatches fast"-kinda algorithm - something like:

1) Compare size of arrays, if different, it's not what you look for

2) Compare a hash-value of each bitmaps

3) Compare each bit / byte one-by-one and as soon as you see a difference, it's not what you look for

4) Win, you found a match :)

This is very slow, depending on what you're trying to achieve, but easy to implement and it will work. So goes well for a prototype-alike application. As I said, OCR (and all other forms of digital signal processing) are a huge field of research, so it's not something you can expect people to teach you in a quick forumpost, sadly :(

Good luck

[EDIT] Looking at the comment in your OQ, I'll say going for a hashtable / dictionary datastructure would be the fastest for you. That, or a binary search tree.. Both very reliant onj your hash-key generator :)

[EDIT2 (xD)] "It's aliased text generated by a computer. The Background is different, but the text always has the same color." Pretty important information there :P Are the size of the text / bitmaps always the same as well? I'd suggest that either implement your own hashing algorithm where you discard the preset background colors, so that the hashing value only depends on the color of the text (and the shape of this too ofc) or simply rewrites all background pixels in your targets to be the same color as your original (or just set the original background to that of your targets? Depends again on which data you are fighting with here - need more information :) ).

cwap
Alright, the image is basically a table which has two slightly different backgrounds. I could either set the contrast higher or write the background pixels to white.The own hasing algorithm looks like the fastest alternative, I'll look into this.
BeatMe