views:

79

answers:

3

Hi all..

Think the title describes my thoughts pretty well :)

I've seen a lot of people lately that swear to LINQ, and while I also believe it's awesome, I also think you shouldn't be confused about the fact that on most (all?) IEnumerable types, it's performance is not that great. Am I wrong in thinking this? Especially queries where you nest Where()'s on large datasets?

Sorry if this question is a bit vague, I just want to confirm my thoughts in that you should be "cautious" when using LINQ.

[EDIT] Just to clarify - I'm thinking in terms of Linq to Objects here :)

+3  A: 

It depends on the provider. For Linq to Objects, it's going to be O(n), but for Linq to SQL or Entities it might ultimately use indices to beat that. For Objects, if you need the functionality of Where, you're probably going to need O(n) anyway. Linq will almost certainly have a bigger constant, largely due to the function calls.

Matthew Flaschen
Yeah, sorry, meant Linq to objects here. I was thinking if there's an IEnumerable implementation here that somehow can beat O(N) - But I guess there aren't :)
cwap
Well, the Where for `IEnumerable` objects is `Enumerable.Where`, so there's really only one implementation. Even if it wanted to special case, I don't see how it could beat O(n).
Matthew Flaschen
AFAIK, IEnumerable only defines GetEnumerator(), which in turn only exposes GetNext() and Reset(). .Where is an extension method, which I don't know if anyone built in datatype cares to override. My guess is that .Where is just a foreach in disguise? :)
cwap
Yes, [Mono's](http://anonsvn.mono-project.com/viewvc/trunk/mcs/class/System.Core/System.Linq/Enumerable.cs#l3002) at least is just a foreach in an iterator-generator.
Matthew Flaschen
Spent a little time in Reflector, and it seems like .Any, .Single, .All, and the likes are implemented in the straightforward way you'd assume. Couldn't find the logic for .Where though :S
cwap
+3  A: 

It depends on how you are using it and to what you compare.

I've seen many implementations using foreaches which would have been much faster with linq. Eg. because they forget to break or because they return too many items. The trick is that the lambda expressions are executed when the item is actually used. When you have First at the end, it could end up it just one single call.

So when you chain Wheres, if an item does not pass the first condition, it will also not be tested for the second condition. it's similar to the && operator, which does not evaluate the condition on the right side if the first is not met.

You could say it's always O(N). But N is not the number of items in the source, but the minimal number of items required to create the target set. That's a pretty good optimization IMHO.

Stefan Steinegger
Indeed, chaining `.First()` on the end of a `.Where()` will be less than O(N) as long as something matches that isn't the last element...
ck
+2  A: 

Here's a project that promises to introduce indexing for LINQ2Objects. This should deliver better asymptotic behavior: http://i4o.codeplex.com/

Dmitry Ornatsky