tags:

views:

108

answers:

3

I'm currently at the point where I can convert Bitmap into byte arrays. Suppose I have 26 images representing a-z with 26 corresponding byte arrays. Given an image I would like to use the byte array to instantly lookup the correct letter rather than performing up to 26 comparisons. Is there some way of hashing the byte arrays to produce a hash code that can be stored in a configuration file?

Alternatively if there is a better (faster) approach than hashing the images (assuming I have no access to the underlying textual representation) I would very much like to know about them. For clarification purposes suppose I have "a.bmp", "b.bmp" etc. I now have an unknown image on the screen. I would have thought hashing the image and performing a single lookup would be the fastest way for a positive identification. It should be faster than performing up to 26 individual comparisons. If this assumption is incorrect, I would appreciate an outline of the optimal method.

Note: It's not a classic OCR problem (handwriting recognition etc) because the letters will be rendered identically every time. Therefore the letter "a" will always produce exactly the same hash code

+1  A: 

A better question to ask is: why are you approaching this problem this way? Under what circumstances would you receive a byte array and need to match it to a character in this fashion? This isn't a good approach for image or character recognition, and just about any other problem would provide you with metadata describing the image which would be a more useful and efficient key than the image data itself.

David Lively
Looks like homework to me, or else some quick-and-dirty OCR...
egrunin
Agree about homework, though this would quick,dirty and useless OCR as a single pixel difference makes this method useless.
David Lively
+1  A: 

Find a small number of bytes that when considered together are unique for each image. If you can find 4 or fewer bytes that uniquely define an image, you can extract these four bytes convert it directly to an Int32 using simple bitshifting operations. This integer is then a fingerprint for the image that you can store.

Alternatively, if you want something a little slower to execute but much easier to code, just hash the byte array using a standard hash function (SHA-1 for example) and use the hash value as the fingerprint.

Mark Byers
+2  A: 

You can find a C# algorithm to hash an array of bytes here. You can then use a C# hash table datatype to map the hash to the character. However, you would still need to scan every byte of every bitmap, so the operation is O(B * N) where B is the number of bytes in the bitmap and N is the number of characters. Not particularly efficient given the size of typical bitmaps.

However, if this is OCR (optical character recognition) this hash function will be absolutely useless. The value of the hash changes greatly even if one pixel is different, so typical optical noise from scanners or digital cameras would prevent two pictures of the same character from hashing identically. There are programmatic OCR techniques out there, but that is an extremely deep topic and you're much better off using a pre-built library if this is an OCR problem.

David Gladfelter
For that matter, every C# object provides a .GetHashCode() method.
David Lively
True, but the implementation hashes based on the object identity. Two identical byte arrays at different memory addresses would return two different hash codes. I'm guessing this isn't the behavior the question-poser desires.
David Gladfelter