views:

245

answers:

4

Suppose I have a dictionary in C#. Assuming the keys are comparable, how do I find the smallest key greater than a given k (of the same type as the keys of the dictionary)? However I would like to do this efficiently with a collection like a SortedDictionary.

Clearly, if it were not a question of doing it efficiently one could start with any dictionary, extract its keys and then use the First method with a suitable predicate. But this would execute in linear time (in the number of keys) when if one has a sorted set of keys one should be be able to find the key in log time.

Thanks.

+1  A: 

For any dictionary, you will have to sort the keys yourself, and then do a binary search on the keys to find the one that matches your value.

This will give you a time of (n * log(n)) + log(n) for that whole operation.

If the keys are already sorted then you can reduce it to log(n) but with most dictionaries, this isn't the case.

That being said, it becomes a simple matter of comparing the functions of f(n) vs f((n * log(n)) + log(n)) and seeing how many keys you will typically want to perform this operation on and if it is better to do a linear or binary search.

That being said, f(n) will always be lower than f((n * log(n))), so it is better to just search the keys linearly.

casperOne
Right, this is what I'm trying to get at! Suppose I start with a SortedDictionary, then (I'm hoping) that it should be straightforward to find the key I describe in original question. However, browsing through the MSDN help files, it seems I need to reinvent the wheel (as you describe above) which seems silly.
banbh
Looks like n is going to be less than n*log(n) + log(n) for any n. Why compare plotted values? If we are going to loop through the entire collection, no sorteddictionary is needed; a simple list will do this always in O(n) time.
Tarydon
@Tarydon The statement was more to indicate to the OP how to figure out what the best performance impact is. However, I've changed the answer to give a more definitive answer, to be more definitive.
casperOne
A: 

Hi banbh,

are you sure, the use of the SortedDictionary would execute in linear time? Since this is a class by microsoft, I would expect them to have it optimized.

I suggest you actually write some test methods to be sure.

br, Marcel

Marcel
A: 

Since SortedDictionary implements IEnumerable, why not loop through the collection and stop when you hit the first value greater than k? Unless you have a large collection and your target is nearer to the end, this should give you reasonable performance. Just how big is your dictionary?

ebpower
+2  A: 

The SortedList<TKey, TValue> class implements IDictionary<TKey, TValue> and has an IndexOfKey method; I think that's what you want:

// I'm just going to pretend your keys are ints
var collection = new SortedList<int, string>();

// populate collection with whatever

int k = GetK(); // or whatever

int kIndex = collection.IndexOfKey(k);

int? smallestKeyGreaterThanK = null;
if (collection.Count > kIndex + 1)
    smallestKeyGreaterThanK = collection.Keys[kIndex + 1];

According to the MSDN documentation:

This method performs a binary search; therefore, this method is an O(log n) operation.

EDIT: If you can't be sure that the dictionary contains the key you're looking for (you just want the next-largest), there is still a way to leverage an existing binary search method from .NET for your purposes. You said you are looking for an "efficient" solution; the following fits that criterion if you mean in terms of your time (and in terms of lines of code). If you mean in terms of memory usage or performance, on the other hand, it might not be ideal. Anyway:

List<int> keysList = new List<int>(collection.Keys);
int kIndex = keysList.BinarySearch(k);

Now, BinarySearch will give you what you're looking for, but if the key's not there, it's a little wacky. The return value, from the MSDN documentation, is as follows:

The zero-based index of item in the sorted List<T>, if item is found; otherwise, a negative number that is the bitwise complement of the index of the next element that is larger than item or, if there is no larger element, the bitwise complement of Count.

What this means is that you'll need to add another line:

kIndex = kIndex >= 0 ? kIndex : ~kIndex;
Dan Tao
Thanks. Unfortunately, in my case I can not guarantee that collection contains k as a key. In fact given your answer, I now suspect that there is no way to avoid hand coding a binary search (perhaps better called a bisection search in this case) on the keys.
banbh
@banbh: Probably. You *could* cheat a little and use the `BinarySearch` method provided via the `List<T>` class (see my edit); but that requires allocating more memory that you really don't need to allocate. Still, though, if you're really opposed to writing your own binary search, it will work.
Dan Tao
Don't forget to sort that list before the binary search, if the keys are coming from an unsorted dictionary.
Aaronaught
@Aaron: My initial suggestion was to use a `SortedList<TKey, TValue>`, in which case the keys would be sorted in advance. But you're right to point that out in general.
Dan Tao
Not necessarily any reason to use the `SortedList` if the value isn't guaranteed to be there. Especially for large N, it is usually faster to use an unsorted collection and perform a manual sort when needed. Of course, if you're going to be copying the entire dictionary anyway, then you already have an O(N) operation so the binary search is kind of redundant, just do a linear one. ;)
Aaronaught
@Aaron: Well, I'd say it depends on how likely it is for the key not to be present. You could always try `IndexOfKey` and then resort to an alternative method if it returns -1. I think it would be worth it if your keys are going to be there a large percentage of the time. Otherwise, you're right, and banbh *is* asking about something that performs better than linear time; so clearly performance is what he's after, not just quick working code that isn't efficient...
Dan Tao