ansaurus

Question

Best data structure for nearest neighbour in 1 dimension

Answer 1

A:

Sort the list and use binary search to find the element you are looking for, then compare your left and right neighbors. You can use an array which is O(1) access.

Something like:

int nearest(int[] list, int element) {

    sort(list);
    int idx = binarySearch(element, list);

    // make sure you are accessing elements that exist
    min = (element - list[idx-1] <= list[idx+1] - element) ? idx-1 : idx+1;

    return list[min];
}

This is O(n log n), which will be amortized if you are going to perform many look ups.

EDIT: For that you'd have to move the sorting out of this method

quantumSoup 2010-07-18 18:11:03

First, I still don't see how the min function returns the correct item. You don't even compare with the query point. Secondly, the amortized cost does not seem to improve anything... you should not sort the list when performing queries. You should do so only when modifying the collection of points.

Eyal Schneider 2010-07-18 18:26:58

@Eyal-Schneider Thanks

quantumSoup 2010-07-18 18:43:12

actually, if you move the sort out the binary search should be O(log n)

Muhammad Alkarouri 2010-07-19 22:00:45

Answer 2

+1 A:

As you already mentioned, the fastest and easiest way should be sorting the data and then looking for the left and right neighbour of a data point.

swegi 2010-07-18 18:11:52

Answer 3

+1 A:

If insertion time is irrelevant, then binary search on a sorted array is the simplest way to achieve O(log N) query time. Each time an item is added sort everything. For each query, perform a binary search. If a match is found, return it. Otherwise, the binary search should return the index of the item, where it should have been inserted. Use this index to check the two neighboring items and determine which of them is closer to the query point.

I suppose that there are solutions with O(1) time. I will try to think of one that doesn't involve too much memory usage...

Eyal Schneider 2010-07-18 18:17:18

That should be interesting. I can't see how you can find the nearest neighbour in time that is independent of the size of the dataset. So if you have any solution like that please add it here, though it is more academic curiosity at this stage.

Muhammad Alkarouri 2010-07-19 22:05:24

@Muhammad: This is a tradeoff between time complexity and space complexity. Assuming that you have no space problems (or that the range of values is not so big), then you can simply create a huge array containing at position k the closest point to query value k. This has query time complexity O(1) and space complexity O(max-min). I am not sure how the space complexity can be improved,however...

Eyal Schneider 2010-07-19 22:14:45

@Eyal: Great idea. So this looks like a lookup table implementation of the find nearest function. The trouble is any hashing I can think of for this case will transform it for something O(log n).

Muhammad Alkarouri 2010-07-19 23:01:36

Answer 4

+5 A:

If you need something faster than O(log(n)), which you can easily get with a sorted array or a binary search tree, you can use a van Emde Boas Tree. vEB trees give you O(log(log(n))) to search for the closest element on either side.

mathmike 2010-07-18 18:35:54

Compared to a sorted array, a vEB tree is a complicated space hog. Unless the points are very dense, the effects of the memory hierarchy are likely to wipe out the theoretical difference between O(log n) and O(log log n) and then some.

2010-07-18 20:06:42

That's impressive. I am accepting this answer as the best theoretical one so far for huge linear data. Though realistically I am going to use the sorted list / binary search which should suffice for my purposes.

Muhammad Alkarouri 2010-07-19 22:09:41

Answer 5

A:

Using OCaml's Set:

let nearest xs y =
  let xs, _, zs = Set.split x xs in
  let x, z = Set.max_elt xs, Set.min_elt ys in
  if abs(x-y) < abs(z-y) then x else z

Incidentally, you may appreciate my nth-nearest neighbor sample from OCaml for Scientists and The F#.NET Journal article Traversing networks: nth-nearest neighbors.

Jon Harrop 2010-08-06 11:49:47

ansaurus

tags:

views:

answers:

Best data structure for nearest neighbour in 1 dimension

related questions