views:

51

answers:

3

I have a very simple SortedSet with a CompareTo method that sorts on the basis of two class fields. As it is used, this collection can get quite big (million+ objects) and grows and grows over time. I have been using a simple Contains method to determine if a new value already exists in the collection...

As an academic exercise I am doing some benchmarks using Linq (which I am fairly new to) to achieve the same effect and am certain that there is some understanding of Linq that I am lacking because I cannot come remotely close to the same performance and I was wondering if some Linq guru could give me a pointer on what could be done to speed it up.

So... The object has a CompareTo that looks something like this:

public int CompareTo(EntityHistoryChange other)
{
    int recordIdComp = Recordid.CompareTo(other.Recordid);
    int tableIdComp = Tablename.CompareTo(other.Tablename);

    if (recordIdComp == 0 && tableIdComp == 0)
        return 0;
    else if (recordIdComp != 0)
        return recordIdComp;
    else
        return tableIdComp;
}

The corresponding Linq query on simple List:

var handledChange = from thisChange in handledChanges
                    where thisChange.Recordid == recordId 
                      && thisChange.Tablename == tableName
                    select thisChange;

I suppose the results should not surprise me...

Linq Lookup on 18772 rows: 46 ms
SortSet Lookup on 18772 rows: 3 ms

So the question is - what is the equivalent LINQ mechanism?

A: 

Many LINQ operators check for interfaces beyond IEnumerable<T> and make use of them.

E.g. Count will check for ICollection<T> and use its Count property rather than iterating through the whole collection. The only way to see these (outside of benchmarks) is to look at the IL (or use Refector), and of course the implementation might change with a new .NET version (including SP). E.g. in .NET r.5 Count didn't check for ICollection, but it does in 4.

Richard
+1  A: 

Linq will never be as fast as this, since the object that Linq sees is not SortedSet, but IEnumerable<T>, which has no semantics other than "Give me a list of objects". You're not taking advantage of the Set'ness at all.

What key is SortedSet<T> sorting by? Wouldn't this just be a lookup via SortedSet.Contains, then you can check the table name?

Paul Betts
A: 

LINQ isn't intended to replace the use of the correct data structures for a given job. It just makes dealing with those data structures easier. If you're storing the data in a SQL database, you'd still be expected to use intelligent indexes on your DB to improve performance. Likewise, with LINQ to Objects you need to leverage data structures like SortedSet<T> where appropriate.

So the answer to your question is: A LINQ query to simulate the Contains method would be:

var exists = handledChanges.Any(c => c.Recordid = recordId && c.Tablename == tableName);

But if you're using LINQ to Objects, this will never achieve the same performance as using a Contains method on a data structure that is specially tailored to have quick lookups. If you're using LINQ to SQL or LINQ to Entities, this will provide an optimized SQL query, which can run very quickly.

By the way, if your goal is to get faster lookups on an in-memory collection, you may want to consider using a HashSet with a custom IEqualityComparer. Its Contains method should take just as long on a collection of millions of objects as it will on a collection of 10.

StriplingWarrior