views:

290

answers:

5

I'm a complete LINQ newbie, so I don't know if my LINQ is incorrect for what I need to do or if my expectations of performance are too high.

I've got a SortedList of objects, keyed by int; SortedList as opposed to SortedDictionary because I'll be populating the collection with pre-sorted data. My task is to find either the exact key or, if there is no exact key, the one with the next higher value. If the search is too high for the list (e.g. highest key is 100, but search for 105), return null.

// The structure of this class is unimportant.  Just using
// it as an illustration.
public class CX
{
    public int KEY;
    public DateTime DT;
}

static CX getItem(int i, SortedList<int, CX> list)
{
    var items =
    (from kv in list
     where kv.Key >= i
     select kv.Key);

    if (items.Any())
    {
        return list[items.Min()];
    }

    return null;
}

Given a list of 50,000 records, calling getItem 500 times takes about a second and a half. Calling it 50,000 times takes over 2 minutes. This performance seems very poor. Is my LINQ bad? Am I expecting too much? Should I be rolling my own binary search function?

+3  A: 

Using LINQ on a SortedList will not give you the benefit of the sort.

For optimal performance, you should write your own binary search.

SLaks
I had hoped not, but this is what I feared. Thanks!
Chris Simmons
There's no reason that LINQ can't be used on the list in a way that respects the sorting. You just shouldn't expect functions like `Min` to be aware of the optimizations that can be made (ie, the fact that items are guaranteed to be in order in the list, so `Min` is just the first and `Max` is just the last).
Adam Robinson
@Adam Robinson: It still seems like SLaks is right, though; won't a `Where` followed by a `FirstOrDefault` essentially result in a linear search for the first matching value? Yes, that's better than `Min` (which would have to enumerate); but it's still not as fast as it *could* be. I don't think the LINQ extensions are specially aware of always-sorted collections such as `SortedList.Keys`. Am I wrong about that? Or am I missing something?
Dan Tao
@SLaks: Curious to know: why did you strike out the line about using binary search? It seems to me this was an accurate statement; what am I missing?
Dan Tao
@Dan: No, they aren't aware of the internals of the data store (all they know is that it implements `IEnumerable<T>` or `IQueryable<T>`), but that doesn't mean that you can't code a LINQ query (that's more than just a simple value search) that takes advantage of the sorted nature of the list. You're right in that it is performing a linear search, but that's what the list would do internally (this is a `SortedList`, not a `SortedDictionary`, which would use a hash table). Furthermore, because the OP wants to find *the value or the next highest if it isn't present*, a linear search is required.
Adam Robinson
@Adam Robinson: I must be missing something. How is a linear search required? If the OP wrote a binary search to find the index of the matching or next highest key, he could get the value with `list[Keys[indexOfMatchingOrNextHighestKey]]`.
Dan Tao
While a binary search is the right answer, rolling your own isn't necessary; see my answer.
tzaman
@Dan: A binary search could be used to optimize finding the key and determining its existence. You would still need to move to the next element if the requested key could not be found, but given the OP's apparent results it seems like a LINQ-based linear search is *easier*, if not necessarily the most efficient. My first statement was simply meant to indicate that you can tailor a LINQ query (to an extent) if you know that the data store is sorted. My second statement (stating that a linear search is *required*) was a misstatement; I merely meant that a binary search *alone* couldn't be used.
Adam Robinson
@Adam Robinson: Also, a linear search is actually *not* what `SortedList` does internally; according to [MSDN](http://msdn.microsoft.com/en-us/library/ms132336.aspx), the `IndexOfKey` method does run a binary search. Only problem is that it returns -1 if the key isn't found (instead of the bitwise complement of the next index, like `List<T>.BinarySearch does). The OP basically needs `IndexOfKey` but with the behavior of `List<T>.BinarySearch`.
Dan Tao
@Adam Robinson: Fair enough. I definitely agree that a LINQ-based linear search is substantially easier to do.
Dan Tao
@Dan: I stand corrected! `SortedList` isn't a structure that I find much opportunity to use, so it's good to get an opportunity to be schooled a bit ;)
Adam Robinson
+6  A: 

First, your query is being evaluated twice (once for Any, and once for Min). Second, Min requires that it iterate over the entire list, even though the fact that it's sorted means that the first item will be the minimum. You should be able to change this:

if (items.Any())
{
    return list[items.Min()];
}

To this:

var default = 
    (from kv in list
     where kv.Key >= i
     select (int?)kv.Key).FirstOrDefault();

if(default != null) return list[default.Value];

return null;

UPDATE

Because you're selecting a value type, FirstOrDefault doesn't return a nullable object. I have altered your query to cast the selected value to an int? instead, allowing the resulting value to be checked for null. I would advocate this over using ContainsKey, as that would return true if your list contained a value for 0. For example, say you have the following values

0 2 4 6 8

If you were to pass in anything less than or equal to 8, then you would get the correct value. However, if you were to pass in 9, you would get 0 (default(int)), which is in the list but isn't a valid result.

Adam Robinson
Wow, yeah. That changed 2 minutes to 1/2 a second.
Chris Simmons
Since FirstOrDefault is going to be returning the int key here, I needed to change default.Value to just default. A couple things, though: What is the "Default" here? I can't seem to find the meaning of that in MSDN. And since I don't know that, I need to check list.ContainsKey(default) before I return list[default]; Fortunately, that doesn't add any significant time to the execution.
Chris Simmons
Would the default just be the default for int, 0?
Chris Simmons
n/m on the default question. I see this means 0 for value types.
Chris Simmons
@Chris: See my update.
Adam Robinson
@Chris: for more information on default, look up the default() keyword in c#.
JMarsch
Are you sure `KeyValuePair` can be null? `KeyValuePair` is a value type as well.
Dan Tao
@Dan: It appears that I should do this sort of thing while in Visual Studio. You're absolutely right, `KeyValuePair<TKey, TValue>` is, indeed, a value type, meaning that it can't be null. The OP's solution. I'll update as appropriate.
Adam Robinson
More concise version: `return list.FirstOrDefault(kv => kv.Key >= i).Value;` Using the predicate version of `FirstOrDefault` avoids the need for the `int?` cast, null-check, and index into the list later. :)
tzaman
@tzaman: D'oh, I'm having quite the off day today. I have no clue why I didn't think of using the query to obtain the value rather than the key. In retrospect that's a bit silly. Good solution; should be an answer instead of a comment!
Adam Robinson
@Adam: Well, it's still your answer, I just cleaned it up a bit! :) Feel free to edit it into yours.
tzaman
+1  A: 

Why not use the BinarySearch that's built into the List class?

var keys = list.Keys.ToList();
int index = keys.BinarySearch(i);
if (index < 0)
    index = ~index;
var item = index < keys.Count ? list[keys[index]] : null;
return item;

If the search target isn't in the list, BinarySearch returns the bit-wise complement of the next-higher item; we can use that to directly get you what you want by re-complementing the result if it's negative. If it becomes equal to the Count, your search key was bigger than anything in the list.

This should be much faster than doing a LINQ where, since it's already sorted... As comments have pointed out, the ToList call will force an evaluation of the whole list, so this is only beneficial if you do multiple searches without altering the underlying SortedList, and you keep the keys list around separately.

tzaman
Very interesting. I'll give this a try.
Chris Simmons
Whoa, this will definitely not be faster. `ToList` will populate an entirely new `List`. This makes it O(n) minimum, in addition to a whole added memory cost.
Dan Tao
While this is an interesting idea, transforming a `SortedList` into a `List` in this fashion is going to evaluate every element of the underlying `SortedList` individually anyway.
Adam Robinson
Oh drat, of course. That brings up the question, why isn't there a `BinarySearch` method on the `SortedList` class itself? It'd seem like a no-brainer...On the other hand, if you're doing more than one search in a row without altering the SortedList, you can keep the `Keys` list around separately, which would make this approach worthwhile.
tzaman
+3  A: 

Writing a binary search on your own can be tough.

Fortunately, Microsoft already wrote a pretty robust one: Array.BinarySearch<T>. This is, in fact, the method that SortedList<TKey, TValue>.IndexOfKey uses internally. Only problem is, it takes a T[] argument, instead of any IList<T> (like SortedList<TKey, TValue>.Keys).

You know what, though? There's this great tool called Reflector that lets you look at .NET source code...

Check it out: a generic BinarySearch extension method on IList<T>, taken straight from the reflected code of Microsoft's Array.BinarySearch<T> implementation.

public static int BinarySearch<T>(this IList<T> list, int index, int length, T value, IComparer<T> comparer) {
    if (list == null)
        throw new ArgumentNullException("list");
    else if (index < 0 || length < 0)
        throw new ArgumentOutOfRangeException((index < 0) ? "index" : "length");
    else if (list.Count - index < length)
        throw new ArgumentException();

    int lower = index;
    int upper = (index + length) - 1;

    while (lower <= upper) {
        int adjustedIndex = lower + ((upper - lower) >> 1);
        int comparison = comparer.Compare(list[adjustedIndex], value);
        if (comparison == 0)
            return adjustedIndex;
        else if (comparison < 0)
            lower = adjustedIndex + 1;
        else
            upper = adjustedIndex - 1;
    }

    return ~lower;
}

public static int BinarySearch<T>(this IList<T> list, T value, IComparer<T> comparer) {
    return list.BinarySearch(0, list.Count, value, comparer);
}

public static int BinarySearch<T>(this IList<T> list, T value) where T : IComparable<T> {
    return list.BinarySearch(value, Comparer<T>.Default);
}

This will let you call list.Keys.BinarySearch and get the negative bitwise complement of the index you want in case the desired key isn't found (the below is taken basically straight from tzaman's answer):

int index = list.Keys.BinarySearch(i);
if (index < 0)
    index = ~index;
var item = index < list.Count ? list[list.Keys[index]] : null;
return item;
Dan Tao
Nice! This extension combined with my answer would be the definitive way to solve this, I think.
tzaman
This is the one. Thank you and thanks to others who answered as well.
Chris Simmons
+2  A: 

OK, just to give this a little more visibility - here's a more concise version of Adam Robinson's answer:

return list.FirstOrDefault(kv => kv.Key >= i).Value; 

The FirstOrDefault function has an overload that accepts a predicate, which selects the first element satisfying a condition - you can use that to directly get the element you want, or null if it doesn't exist.

tzaman