ansaurus

Question

High Dimension Nearest Neighbor Search and Locality Sensitivity Hashing

Answer 1

+1 A:

Maby this is a little off topic but you can try using PCA http://en.wikipedia.org/wiki/Principal_component_analysis for reducing the dimensionality of the dataset. There should be plenty of PCA modules designed for numPy ( for example: http://folk.uio.no/henninri/pca_module/). The method is rather simple and with a ready to use modules it will be a snap.

Basicly what it does is reduce the number of dimensions ( you should be able to specify desired number) by maximizing variance within the given number of dimensions.

Piotr Duda 2010-07-16 07:17:05

I ended up using the MTP toolkit for python to perform PCA on my data. Very very efficient and does exactly what I've been trying to do.

Aaron Merriam 2010-07-21 23:00:09

MTP ? do you mean MDP, http://mdp-toolkit.sourceforge.net ?

Denis 2010-08-25 16:10:39

Answer 2

+2 A:

Here are two answers:

B: The Wikipedia page indicates that math.floor() should be used on hashVal: this is how you obtain integers.

C: If you want to use the Hamming method, you can implement it quite simply: each Hamming hash function is simply defined by a coordinate (between 0 and 47) and a bit number (between 0 and 7). You can get the value of an integer at a given bit b with:

bool(i & 2**b)

EOL 2010-07-16 07:51:57

ansaurus

tags:

views:

answers:

High Dimension Nearest Neighbor Search and Locality Sensitivity Hashing

related questions