views:

46

answers:

3

Say I have a List<T> with 1000 items in it.

Im then passing this to a method that filters this List. As it drops through the various cases (for example there could be 50), List<T> may have upto 50 various Linq Where() operations performed on it.

Im interested in this running as quickly as possible. Therefore, i dont want this List<T> filtered each time a Where() is performed on it.

Essentially I need it to defer the actual manipulation of the List<T> until all the filters have been applied.

Is this done natively by the compiler? Or just when I call .ToList() on the IEnumerable that a List<T>.Where() returns, or should i perform the Where() operations on X (Where X = List.AsQueryable())?

Hope this makes sense.

+4  A: 

Yes, deferred execution is supported natively. Every time you apply a query or lambda expression on your List the query stores all the expressions that is executed only when you call .ToList() on the Query.

this. __curious_geek
Do you know what sort of speed penalty there is on calling List<T>.AsQueryable().ToList() (i.e. a list with no expressions called on it)?
maxp
Edit: 32767 x List<T>.AsQueryable().ToList() took 120ms so basically negligible.
maxp
+3  A: 

Each call to Where will create a new object which knows about your filter and the sequence it's being called on.

When this new object is asked for a value (and I'm being deliberately fuzzy between an iterator and an iterable here) it will ask the original sequence for the next value, check the filter, and either return the value or iterate back, asking the original sequence for the next value etc.

So if you call Where 50 times (as in list.Where(...).Where(...).Where(...), you end up with something which needs to go up and down the call stack at least 50 times for each item returned. How much performance impact will that have? I don't know: you should measure it.

One possible alternative is to build an expression tree and then compile it down into a delegate at the end, and then call Where. This would certainly be a bit more effort, but it could end up being more efficient. Effectively, it would let you change this:

list.Where(x => x.SomeValue == 1)
    .Where(x => x.SomethingElse != null)
    .Where(x => x.FinalCondition)
    .ToList()

into

list.Where(x => x.SomeValue == 1 && x.SomethingElse != null && x.FinalCondition)
    .ToList()

If you know that you're just going to be combining a lot of "where" filters together, this may end up being more efficient than going via IQueryable<T>. As ever, check performance of the simplest possible solution before doing something more complicated though.

Jon Skeet
The previous code attacked this problem by using the Linq Dynamic library (http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx), which just built a long 'where' string. The idea is nice, but without timing it, im still confident it is slower than just shotgunning where's together as it has to parse the string bleh.
maxp
@maxp: Doing it dynamically would probably be slower, yes - but you don't have to do it dynamically. You can use expression trees to do all of this safely, and build it into "native" IL using LambdaExpression.Compile.
Jon Skeet
@Jon: I wanted to comment on your answer by telling about the performance optimizations for arrays and List<T> inside of the `Enumerable.Where` extension methods. However, I noticed that these optimizations have no impact on the correctness of your answer. While these optimizations prevent 50 iterators to be wrapped, they won't prevent 50 delegates to be wrapped and thus won't prevent a 50 calls deep call stack. Best for performance would indeed be one single delegate.
Steven
A: 

There is so much failure in the question and the comments. The answers are good but don't hit hard enough to break through the failure.

Suppose you have a list and a query.

List<T> source = new List<T>(){  /*10 items*/ };
IEnumerable<T> query = source.Where(filter1);
query = query.Where(filter2);
query = query.Where(filter3);
...
query = query.Where(filter10);

Is [lazy evaluation] done natively by the compiler?

No. Lazy evaluation is due to the implementation of Enumerable.Where

This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.


speed penalty there is on calling List.AsQueryable().ToList()

Don't call AsQueryable, you only need to use Enumerable.Where.


thus won't prevent a 50 calls deep call stack

Depth of call stack is much much less important than having a highly effective filter first. If you can reduce the number of elements early, you reduce the number of method calls later.

David B
For the record - failure = not understanding and not looking up the methods being used.
David B