views:

165

answers:

5

I have a list of values (1-dimensional) and I would like to know the best data structure / algorithm for finding the nearest to a query value I have. Most of the solutions (all?) I found for questions here are for 2 or more dimensions. Can anybody suggest to me the approach for my case?

My instinct tells me to sort the data and use binary search somehow. By the way, there is no limit on the construction or insertion time for any tree needed, so probably someone can suggest a better tree than simply a sorted list.

A: 

Sort the list and use binary search to find the element you are looking for, then compare your left and right neighbors. You can use an array which is O(1) access.

Something like:

int nearest(int[] list, int element) {

    sort(list);
    int idx = binarySearch(element, list);

    // make sure you are accessing elements that exist
    min = (element - list[idx-1] <= list[idx+1] - element) ? idx-1 : idx+1;

    return list[min];
}

This is O(n log n), which will be amortized if you are going to perform many look ups.

EDIT: For that you'd have to move the sorting out of this method

quantumSoup
First, I still don't see how the min function returns the correct item. You don't even compare with the query point. Secondly, the amortized cost does not seem to improve anything... you should not sort the list when performing queries. You should do so only when modifying the collection of points.
Eyal Schneider
@Eyal-Schneider Thanks
quantumSoup
actually, if you move the sort out the binary search should be O(log n)
Muhammad Alkarouri
+1  A: 

As you already mentioned, the fastest and easiest way should be sorting the data and then looking for the left and right neighbour of a data point.

swegi
+1  A: 

If insertion time is irrelevant, then binary search on a sorted array is the simplest way to achieve O(log N) query time. Each time an item is added sort everything. For each query, perform a binary search. If a match is found, return it. Otherwise, the binary search should return the index of the item, where it should have been inserted. Use this index to check the two neighboring items and determine which of them is closer to the query point.

I suppose that there are solutions with O(1) time. I will try to think of one that doesn't involve too much memory usage...

Eyal Schneider
That should be interesting. I can't see how you can find the nearest neighbour in time that is independent of the size of the dataset. So if you have any solution like that please add it here, though it is more academic curiosity at this stage.
Muhammad Alkarouri
@Muhammad: This is a tradeoff between time complexity and space complexity. Assuming that you have no space problems (or that the range of values is not so big), then you can simply create a huge array containing at position k the closest point to query value k. This has query time complexity O(1) and space complexity O(max-min). I am not sure how the space complexity can be improved,however...
Eyal Schneider
@Eyal: Great idea. So this looks like a lookup table implementation of the find nearest function. The trouble is any hashing I can think of for this case will transform it for something O(log n).
Muhammad Alkarouri
+5  A: 

If you need something faster than O(log(n)), which you can easily get with a sorted array or a binary search tree, you can use a van Emde Boas Tree. vEB trees give you O(log(log(n))) to search for the closest element on either side.

mathmike
Compared to a sorted array, a vEB tree is a complicated space hog. Unless the points are very dense, the effects of the memory hierarchy are likely to wipe out the theoretical difference between O(log n) and O(log log n) and then some.
That's impressive. I am accepting this answer as the best theoretical one so far for huge linear data. Though realistically I am going to use the sorted list / binary search which should suffice for my purposes.
Muhammad Alkarouri
A: 

Using OCaml's Set:

let nearest xs y =
  let xs, _, zs = Set.split x xs in
  let x, z = Set.max_elt xs, Set.min_elt ys in
  if abs(x-y) < abs(z-y) then x else z

Incidentally, you may appreciate my nth-nearest neighbor sample from OCaml for Scientists and The F#.NET Journal article Traversing networks: nth-nearest neighbors.

Jon Harrop