views:

414

answers:

3

Hi Stackers,

I am looking for a structure that holds a sorted set of double values. I want to query this set to find the closest value to a specified reference value.

I have looked at the SortedList<double, double>, and it does quite well for me. However, since I do not need explicit key/value pairs. this seems to be overkill to me, and i wonder if i could do faster.

Conditions:

  • The structure is initialised only once, and does never change (no insert/deletes)
  • The amount of values is in the range of 100k.
  • The structure is queried often with new references, which must execute fast.
  • For simplicity and speed, the set's value just below of the reference may be returned, not actually the nearest value
  • I want to use LINQ for the query, if possible, for simplicity of code.
  • I want to use no 3rd party code if possible. .NET 3.5 is available.
  • Speed is more importand than memory footprint

I currently use the following code, where SortedValues is the aforementioned SortedList

    IEnumerable<double> nearest = from item in SortedValues.Keys
                                  where item <= suggestion
                                  select item;
    return nearest.ElementAt(nearest.Count() - 1);

Can I do faster?

Also I am not 100% percent sure, if this code is really safe. IEnumerable, the return type of my query is not by definition sorted anymore. However, a Unit test with a large test data base has shown that it is in practice, so this works for me. Have you hints regarding this aspect?

P.S. I know that there are many similar questions, but none actually answers my specific needs. Especially there is this one http://stackoverflow.com/questions/1363773/c-data-structure-like-dictionary-but-without-a-value, but the questioner does just want to check the existence not find anything.

+6  A: 

The way you are doing it is incredibly slow as it must search from the beginning of the list each time giving O(n) performance.

A better way is to put the elements into a List and then sort the list. You say you don't need to change the contents once initialized, so sorting once is enough.

Then you can use List<T>.BinarySearch to find elements or to find the insertion point of an element if it doesn't already exist in the list.

From the docs:

Return Value

The zero-based index of item in the sorted List<T>, if item is found; otherwise, a negative number that is the bitwise complement of the index of the next element that is larger than item or, if there is no larger element, the bitwise complement of Count.

Once you have the insertion point, you need to check the elements on either side to see which is closest.

Mark Byers
Depends on how often you want to find a Nearest-Item vs how often the list is refreshed.
Henk Holterman
See the question. The list will never have any inserts or removals.
Mark Byers
Wow, talk about abusing return values.
Eric
Mark, you're right. Missed that.
Henk Holterman
Mark, ur da man! This executes roughly 200 times faster than my provided version.
Marcel
+1  A: 

Might not be useful to you right now, but .Net 4 has a SortedSet class in the BCL.

Winston Smith
How can you use a SortedSet to find the nearest element in the case where the exact element doesn't exist?
Mark Byers
In the same manner he currently does. I wasn't addressing the performance aspect of the question, since you'd already answered. My answer was intended to point the OP to a non KVP data structure he might find useful.
Winston Smith
A: 

I think it can be more elegant as follows: In case your items are not sorted:

double nearest = values.OrderBy(x => x.Key).Last(x => x.Key <= requestedValue);

In case your items are sorted, you may omit the OrderBy call...

Amittai Shapira
This would be more elegant, though not faster as I asked for. -1 for not answering the question.
Marcel