views:

81

answers:

3

Hi guys.

I am wondering about general performance of LINQ. I admit, that it comes handy but how performant is LINQ? I know that is a broad question. So I want to ask about a particular example:

I have an anonymous type:

var users = reader.Select(user => new MembershipUser(reader.Name, reader Age));

And now, I want to convert it to the MembershipUserCollection.

So I do it like this:

MembershipUserCollection membershipUsers = new MembershipUserCollection();
users.ToList().ForEach(membershipUsers.Add); //what is the complexity of this line?

What is the complexity of the last line? Is it n^2 ?

Is ToList() method iterates for each element of the users and adds it to the list? Or does ToList() works differently? Because if it is not, I find hard to justice the reason of using the last line of the code instead of simply:

foreach (var user in users)
{
    membershipUsers.Add(user);
}
+1  A: 

It's O(n) - since .ToList() iterates once through the enumeration and copys the elements into the resulting List<T> (whose insertion is O(1)). Thus the complexity is fine.

The actual issue you might see is that you create a completely new, temporary List<T> just to copy its contents into another list (and afterwards discard it).

I suspect that's just due to the convenience of having a .ForEach()-method on List<T>s. One could nonetheless code a direct implementation for IEnumerable<T>s, which would save this one superfluous copying - or just write

foreach (var user in users) membershipUsers.Add(user)

which is basically what you want to express after all ;-)

Dario
Insert on a List is not O(1) in the general case, adding to the end is (amortized, not worst case). Also, the complexity of iterating a sequence may be anything unless you know how it's being constructed.
Freed
@Freed: Pedantically spoken, you're completely right. But nothing but amortized O(1) push_back is relevant here anyway ...
Dario
Ok ToList() is O(n) but what about ForEach? it is also O(n) right? So we have O(n) * O(n)?
Wodzu
ForEach iterates once through the list - O(n). Thus you first 1) create a list: O(n) and 2) iterate through it: O(n), together O(n) + O(n) = O(n)
Dario
Ah right, thanks for clearing this up :)
Wodzu
A: 

Converting to a list will have the same complexity as iterating over the sequence, which may really by anything depending on how the sequence is generated. A normal Select over an in-memory list is O(n).

The performance of using ForEach on a List vs a foreach loop comes down to the overhead of invoking a delegate vs the overhead of creating and using a enumerator, I cannot say which one is quicker, but if both are used on an in-memory list, the complexity is the same.

Freed
+1  A: 

Your example isn't particularly good for your question because ToList() isn't really in the same class of extension methods as the other ones supporting LINQ. The ToList() extension method is a conversion operation, not a query operation. The real values in LINQ are deferred execution of a composite query built by combining several LINQ query operations and improved readability. In LINQ2SQL you also get the advantage of constructing arbitrary queries that get pushed to the DB server for actual execution, taking advantage of optimizations that the DB may have in place to improve performance.

In general, I would expect that the question of performance largely comes down to how well you construct the actual queries and has a lot more to do with how well the programmer knows the tools and data than how well the tool is implemented. In your case, it makes no sense to construct a temporary list just to be able to invoke the convenience ForEach method on it if all you care about is performance. You'd be better off simply iterating over the enumeration you already have (as you suspect). LINQ won't stop a programmer from writing bad code, though it may disguise bad code for the person who doesn't understand how LINQ works.

It's always the case that you can construct an equivalent program not using LINQ for any program using LINQ. It may be that you can actually improve on the performance. I would submit, though, that LINQ makes it much easier to write readable code than non-LINQ solutions. By that, I mean more compact and understandable. It also makes it easier to write composable code, which when executed in a deferred manner performs better than, non-LINQ compositions. By breaking the code into composable parts, you simplify it and improve understandability.

I think the trick here is to really understand where LINQ makes sense rather than treat it as a shiny, new tool that you need to now use for every problem you have. The nice part of this shiny, new tool, though, is that it really does come in handy in a lot of situations.

tvanfosson