I've recently started using LINQ quite a bit, and I haven't really seen any mention of run-time complexity for any of the LINQ methods. Obviously, there are many factors at play here, so let's restrict the discussion to the plain IEnumerable
LINQ-to-Objects provider. Further, let's assume that any Func
passed in as a selector / mutator / etc. is a cheap O(1) operation.
It seems obvious that all the single-pass operations (Select
, Where
, Count
, Take/Skip
, Any/All
, etc.) will be O(n), since they only need to walk the sequence once; although even this is subject to laziness.
Things are murkier for the more complex operations; the set-like operators (Union
, Distinct
, Except
, etc.) work using GetHashCode
by default (afaik), so it seems reasonable to assume they're using a hash-table internally, making these operations O(n) as well, in general. What about the versions that use an IEqualityComparer
?
OrderBy
would need a sort, so most likely we're looking at O(n log n). What if it's already sorted? How about if I say OrderBy().ThenBy()
and provide the same key to both?
I could see GroupBy
(and Join
) using either sorting, or hashing. Which is it?
Contains
would be O(n) on a List
, but O(1) on a HashSet
- does LINQ check the underlying container to see if it can speed things up?
And the real question - so far, I've been taking it on faith that the operations are performant. However, can I bank on that? STL containers, for example, clearly specify the complexity of every operation. Are there any similar guarantees on LINQ performance in the .NET library specification?
More question (in response to comments):
Hadn't really thought about overhead, but I didn't expect there to be very much for simple Linq-to-Objects. The CodingHorror post is talking about Linq-to-SQL, where I can understand parsing the query and making SQL would add cost - is there a similar cost for the Objects provider too? If so, is it different if you're using the declarative or functional syntax?