views:

1349

answers:

9

I have a set of positive numbers. Given a number not in the set, I want to find the next smallest and next largest numbers that are in the set. The only way I can think to do it now is to find the next smallest by decreasing by 1 until I find a number in the set, and then do the same for finding the next largest.

Motivation: I have a bunch of data in a hashmap, keyed by dates. I don't have a datapoint for every single date. If I have data for, say, 10/01/2000 as 60 and 10/05/2000 as 68, and I ask for 10/02/2000, I want to linearly interpolate. I should get 62.

+1  A: 

Put your data items into a tree, like an AVL tree, a red-black tree, or a B+/B- tree. Then you can search the ordered values.

Charlie Martin
A: 

If you get the keys in an array, you can sort the array and find the index of the last element that is less than the desired element. Then you know the index of the key directly before your desired point, and the next element after that is the one directly after.

That should give you enough to interpolate.

(The data structure used need not be an array, anything that will sort is fine. A balanced binary tree, as suggested by others, would be ideal, especially if you plan to reuse the data later).

fd
+1  A: 

Sort the numbers, then perform binary search on each key to bisect the set. You can then find which numbers are on either side of your missing key.

sykora
+1  A: 

Convert the set to a list and sort it, then run a binary search for the number not in the set. The result will be the insertion point, i.e. the position at which the number would be present if it were there. If you call that n, then the element at index n of the sorted list is the next smallest number and the element at index n+1 of the sorted list is the next largest number.

You can also do this by keeping the set in sorted order as you construct it, then it becomes an easy matter to search for the insertion point. This approach is used by e.g. the floorEntry() and ceilingEntry() methods of Java's TreeMap.

David Zaslavsky
+1  A: 

Keep your set as a sorted list/array and perform bisection-search: e.g., in Python, a sorted list and the bisect module from the standard Python library match your needs to the hilt.

Alex Martelli
+3  A: 

All this boils down to is binary search, provided you can get your data sorted. There are two options.

  1. Sorted Container
    If you keep your numbers in a sorted container, this is pretty easy. Instead of using a HashMap, put the data in a TreeMap, then you can efficiently find the next lower or next higher element. Java even has methods to do exactly what you want:

    This is efficient because TreeMap uses a red-black tree (a kind of balanced binary search tree) internally. higherKey and lowerKey simply start at the root and traverse the tree to find where your element should go.

    I'm not sure what language you're using, but in C++ you would usestd::map, and the analogous methods are:

    • iterator lower_bound(const key_type& k)
    • iterator upper_bound(const key_type& k)
  2. Array + Sorting
    If you don't want to keep your data sorted all the time, you can always dump your data into an array (or any random access container), use sort, and then use the STL's binary search routines on the array:

    In Java the analog would be to dump things into an ArrayList, call Java's sort(), then use binarySearch().

All the search routines here are O(logn) time. The cost of keeping your data sorted is O(nlogn) with either a sorted container or with the array. With a sorted container, the cost is amortized over n insertions; with the array you pay it all at once when you call sort().

If you don't want to sort things at all, you can always use a linear search, but you will pay if you use this a lot, as it's an O(n) algorithm.

tgamblin
+4  A: 

It depends on if your set is sorted.

If your set is unsorted then finding the closest (higher and lower) is an O(n) operation and a fairly simple algorithm.

If your set is sorted then you can use a modified bisection search to find the answer in O(log n), which is obviously a lot better particularly on larger sets.

If you're doing this repeatedly it might be worth sorting the set, which incurs an O(n log n) cost that might be once off or not depending on how often the set changes. Some kind of tree sort may help improve future sorts as new items are added.

cletus
clarify "finding the closest (higher and lower) is an O(n) operation". Finding the n'th element in an unsorted array is a O(n) operation, but it took me some time to understand the algorithm. (Finding a min/max, however, is quite simple.)
Thanatos
A map can be iterated over in whichever language you're using. That iteration is an O(n) operation unless you're using some really weird data structure or language that might make it more expensive. HashMaps in Java can be iterated over in O(n).
cletus
He writes that the data is in a HashMap, i.e., an unsorted container. Depending on the size it would be quite costly to sort it (O(n log n)) compared to probing adjacent keys like he suggests (see my answer below).
Martin Geisler
A: 

Finding the n'th element in an unsorted set is O(n). (Select Algorithm) Although here you can boil it down to a simpler, less general algorithm, if you always want the smallest & next smallest elements. But in general, finding the smallest, second smallest, etc. element within an unsorted list is O(n). (You should have been taught this in your algorithms class...)

Sorting a set, and then indexing the element is O(n log n)

Finding an element in a sorted set is O(log n) (binary search)

Thanatos
He's not asking for the ith element; he's asking for an arbitrary element, so there's no advantage to using a selection algorithm for this. You'd have to do selection for i=1 to n, which would be O(n^2) worst case if you don't use a selection algorithm that does incremental sorting.
tgamblin
"He's not asking for the ith element; he's asking for an arbitrary element" ... ? Those two things mean the same thing- the i'th element is an arbitrary element. Otherwise, I would have said the 1st or the 2nd element. Replace i with whatever element you want. Now, if he want to do this over the entire set (which he did not specify in his post), then yes, a O(n*log(n)) sort would be a better route.
Thanatos
A: 

If you know that there will always be a data point for, say, each week, then keep your HashMap as it is and do what you suggest... That will be a constant time operation since you will be doing 14 hash table lookups (probing 7 days on each side of your search date), each taking O(1) primitive operations.

If you don't know how dense your data is and you can keep it in RAM, then put it into a balanced tree structure as suggested by many others. But this can be costly if you have very many dates and if you have to load the data over the network from a database.

Martin Geisler