views:

62

answers:

3

Hi

We have the Trie structure to efficiently access data when the key to that data set is a string. What would be the best possible index if key to a data set is an image?

By key, I mean some thing which uniquely distinguishes data. Is this a less frequently used scenario i.e. accessing data by an image? I do feel there are applications where it is used like a finger print database.

Does hashing help in this case? I mean hash the image into a unique number, depending on pixel values.

Please share any pointers on this.

cheers

+1  A: 

I'm not 100% sure what you're trying to do, but hashing should give you a unique string to identify an image with. You didn't specify your language, but most have a function to hash an entire file's data, so you could just run the image file through that. (For example, PHP has md5_file())

Chad Birch
+2  A: 

You could use a hash function to find a item based on an image. But I see little practical use for this scenario.

Application such as finger print recognition, face recognition, or object identification perform a feature extraction process. This means they convert the complex image structure into simpler feature vectors that can be compared against stored patterns.

The real hard work is the feature extraction process that must seperate the important information from the 'noise' in the image.

Just hashing the image will will yield no usable features. The only situation I would think about hashing a image to find some information is to build a image database. But even in this case a common hash function as SHA1 or MD5 will be of little use, because modifying a single pixel or metadata such as the author will change the hash and make it impossible to identify the two images based on a common hash function.

Daniel Brückner
+1  A: 

It's unclear what problem you're trying to solve. You can definitely obtain a hash for an entire image and use that as a key in a Trie structure, although I think in this case the Trie structure would give you almost no performance benefit over a regular hash table, because you are performing a (large) hash every time you do a lookup.

If you are implementing something where you want to compare two images or find similar images in the tree quickly, you might consider using the GIF or JPEG header of the image as the beginning of the key. This would cause images with similar type, size, index colors, etc. to be grouped near each other within the Trie structure. You could then compute a hash for the image only if there was a collision (that is, multiple images in the Trie with the exact same header).

Elliot Nelson