ansaurus

Question

How do I find the k-nearest values in n-dimensional space?

Answer 1

A:

The hamming distance is closely related to levenshtein distance, and is similar to algorithms used for spelling correction.

A method that works is branch-and-bound search in a trie. It takes time that is exponential in the distance, for near distance, up to being linear in the dictionary size.

If the dictionary is of binary words stored in a binary trie, with strict hamming distance, here is a simple pseudo-code:

walk(trie, word, i, hit, budget){
  if (budget < 0 || i > word.length) return;
  if (trie==NULL){
    if (i==word.length) print hit;
    return;
  }
  hit[i] = 0;
  walk(trie.subtrie[0], word, i+1, hit, (word[i]==0 ? budget : budget-1));
  hit[i] = 1;
  walk(trie.subtrie[1], word, i+1, hit, (word[i]==1 ? budget : budget-1));
}

main(){
  for (int budget = 0; ; budget++){
    walk(trie, word, 0, hit, budget);
    /* quit if enough hits have been printed */
  }
}

The idea is you walk the entire trie, keeping track of the distance between the current trie node and the original word. You prune the search by having a budget of how much distance you will tolerate. This works because the distance can never decrease as you go deeper into the trie.

Then you do this repeatedly with budgets starting at zero and increasing in steps until you print out the hits you want. Since each walk covers so many fewer nodes than the subsequent walk, it doesn't hurt that you're doing multiple walks. If k is fixed, you can simply start out with that as your budget.

Mike Dunlavey 2010-03-06 13:57:25

ansaurus

tags:

views:

answers:

How do I find the k-nearest values in n-dimensional space?

related questions