views:

367

answers:

5

Hello all,

I have a situation where I need to find the value with the key closest to the one I request. It's kind of like a nearest map that defines distance between keys.

For example, if I have the keys {A, C, M, Z} in the map, a request for D would return C's value.

Any idea?

+3  A: 

Most tree data structures use some sort of sorting algorithm to store and find keys. Many implementations of such can locate a close key to the key you probe with (usually it either the closest below or the closest above). For example Java's TreeMap implements such a data structure and you can tell it to get you the closest key below your lookup key, or the closest key above your lookup key (higherKey and lowerKey).

If you can calculate distances (its not always easy - Java's interface only require you to know if any given key is "below" or "above" any other given key) then you can ask for both closest above and closest below and then calculate for yourself which one is closer.

Guss
Thanks. We had missed that TreeMap includes the methods to do what we wanted.
oconnor0
+4  A: 

What's the dimensionality of your data? If it's just one dimensional, a sorted array will do it - a binary search will locate the exact match and/or reveal betweeen which two keys your search key lies - and a simple test will tell you which is closer.

If you need to locate not just the nearest key, but an associated value, maintain an identically sorted array of values - the index of the retrieved key in the key array is then the index of the value in the value array.

Of course, there are many alternative approaches - which one to use depends on many other factors, such as memory consumption, whether you need to insert values, if you control the order of insertion, deletions, threading issues, etc...

Eamon Nerbonne
Our data is 1 dimensional in this case. I like this idea tho.We ended up using a Guss' sol'n as it comes in Java.
oconnor0
A: 

You can implement something like this as a tree. A simple approach is to assign each node in the tree a bitstring. Each level of the tree is stored as a bit. All parent information is encoded in the node's bitstring. You can then easily locate arbitrary nodes, and find parents and children. This is how Morton ordering works, for example. It has the extra advantage that you can calculate distances between nodes by simple binary subtraction.

If you have multiple links between data values, then your data structure is a graph rather than a tree. In that case, you need a slightly more sophisticated indexing system. Distributed hash tables do this sort of thing. They typically have a way of calculating the distance between any two nodes in the index space. For example, the Kademlia algorithm (used by Bittorrent) uses XOR distances applied to bitstring ids. This allows Bittorrent clients to lookup ids in a chain, converging on the unknown target location. You can use a similar approach to find the node(s) closest to your target node.

ire_and_curses
+1  A: 

BK-trees do precisely what you want. Here's a good article on implementing them.

And here is a Scala implementation:

class BKTree[T](computeDistance: (T, T) => Int, node: T) {
  val subnodes = scala.collection.mutable.HashMap.empty[Int,BKTree[T]]

  def query(what: T, distance: Int): List[T] = {
    val currentDistance = computeDistance(node, what)
    val minDistance = currentDistance - distance
    val maxDistance = currentDistance + distance
    val elegibleNodes = (
      subnodes.keys.toList 
      filter (key => minDistance to maxDistance contains key) 
      map subnodes
    )
    val partialResult = elegibleNodes flatMap (_.query(what, distance))
    if (currentDistance <= distance) node :: partialResult else partialResult
  }

  def insert(what: T): Boolean = if (node == what) false else (
    subnodes.get(computeDistance(node, what)) 
    map (_.insert(what)) 
    getOrElse {
      subnodes(computeDistance(node, what)) = new BKTree(computeDistance, what)
      true
    }
  )

  override def toString = node.toString+"("+subnodes.toString+")"
}

object Test {
  def main(args: Array[String]) {
    val root = new BKTree(distance, 'A')
    root.insert('C')
    root.insert('M')
    root.insert('Z')
    println(findClosest(root, 'D'))
  }
  def charDistance(a: Char, b: Char) = a - b abs
  def findClosest[T](root: BKTree[T], what: T): List[T] = {
    var distance = 0
    var closest = root.query(what, distance)
    while(closest.isEmpty) {
      distance += 1
      closest = root.query(what, distance)
    }
    closest
  }
}

I'll admit to a certain dirt&uglyness about it, and of being way too clever with the insertion algorithm. Also, it will only work fine for small distance, otherwise you'll search repeatedly the tree. Here's an alternate implementation that does a better job of it:

class BKTree[T](computeDistance: (T, T) => Int, node: T) {
  val subnodes = scala.collection.mutable.HashMap.empty[Int,BKTree[T]]

  def query(what: T, distance: Int): List[T] = {
    val currentDistance = computeDistance(node, what)
    val minDistance = currentDistance - distance
    val maxDistance = currentDistance + distance
    val elegibleNodes = (
      subnodes.keys.toList 
      filter (key => minDistance to maxDistance contains key) 
      map subnodes
    )
    val partialResult = elegibleNodes flatMap (_.query(what, distance))
    if (currentDistance <= distance) node :: partialResult else partialResult
  }

  private def find(what: T, bestDistance: Int): (Int,List[T]) = {
    val currentDistance = computeDistance(node, what)
    val presentSolution = if (currentDistance <= bestDistance) List(node) else Nil
    val best = currentDistance min bestDistance
    subnodes.keys.foldLeft((best, presentSolution))(
      (acc, key) => {
        val (currentBest, currentSolution) = acc
        val (possibleBest, possibleSolution) = 
          if (key <= currentDistance + currentBest)
            subnodes(key).find(what, currentBest)
          else
            (0, Nil)
        (possibleBest, possibleSolution) match {
          case (_, Nil) => acc
          case (better, solution) if better < currentBest => (better, solution)
          case (_, solution) => (currentBest, currentSolution ::: solution)
        }
      }
    )
  }

  def findClosest(what: T): List[T] = find(what, computeDistance(node, what))._2

  def insert(what: T): Boolean = if (node == what) false else (
    subnodes.get(computeDistance(node, what)) 
    map (_.insert(what)) 
    getOrElse {
      subnodes(computeDistance(node, what)) = new BKTree(computeDistance, what)
      true
    }
  )

  override def toString = node.toString+"("+subnodes.toString+")"
}

object Test {
  def main(args: Array[String]) {
    val root = new BKTree(distance, 'A')
    root.insert('C')
    root.insert('E')
    root.insert('M')
    root.insert('Z')
    println(root.findClosest('D'))
  }
  def charDistance(a: Char, b: Char) = a - b abs
}
Daniel
A: 

If your keys are strings and your similarity function is Levenshtein distance, then you can use finite-state machines:

Your map is a trie built as a finite-state machine (by unionizing all key/value pairs and determinizing). Then, compose your input query with a simple finite-state transducer that encodes the Levenshtein distance, and compose that with your trie. Then, use the Viterbi algorithm to extract the shortest path.

You can implement all this with only a few function calls using a finite-state toolkit.