views:

51

answers:

2

Thanks to those who've answered my previous questions and gotten me this far.

I have a table of about 25,000 vectors, each with 48 dimensions, with values ranging from 0-255.

I am attempting to develop a Locality Sensitive Hash (http://en.wikipedia.org/wiki/Locality-sensitive_hashing) algorithm for finding a near neighbor or closest neighbor points.

My current LSH function is this:

def lsh(vector, r = 1.0, a = None, b = None):
    if not a:
        a = [normalvariate(10, 4) for i in range(48)]
    if not b:
        b = uniform(0, r)
    hashVal = floor((sum([a[i]*vector[i] for i in range(48)]) + b)/r)
    return int(hashVal)

My questions at this point are:

A: Are there optimal values for "normalvariate(10, 4)" portion of my code. This is pythons built in random.normalvariate (http://docs.python.org/library/random.html#random.normalvariate) function which I am using to produce the "d dimensional vector with entries chosen independently from a stable distribution". From my experimenting, the values don't seem to matter too much.

B: At the top of the wikipedia article it states:

if d(p,q) <= R, then h(p) = h(q) with probability at least P1

if d(p,q) >= cR, then h(p) = h(q) with probability at most P2

Is the R value mentioned here also the R value mentioned under the Stable Distributions section. (http://en.wikipedia.org/wiki/Locality-sensitive_hashing#Stable_distributions)

C: Related to my previous question(B). I've discovered that using higher values of R in my hasing function map my vectors into a smaller range of hash values. Is there a way to optimize my R value.

D: Approximately how many tables might one use?

A: 

To those interested. I've found this document (http://web.mit.edu/andoni/www/papers/cSquared.pdf which has a very detailed, albeit complicated explanation of how to use LSH for high dimensional spaces.

Aaron Merriam
+2  A: 

You might want to check out "MetaOptimize" -- like Stack Overflow for machine learning.
http://metaoptimize.com/qa

Your question isn't really a python or programing question, and that community might be able to help a bit more.

Jon