views:

528

answers:

9

After upgrading to ReSharper5 it gives me even more useful tips on code improvements. One I see everywhere now is a tip to replace foreach-statements with LINQ queries. Take this example:

private Ninja FindNinjaById(int ninjaId)
{
    foreach (var ninja in Ninjas)
    {
        if (ninja.Id == ninjaId)
            return ninja;
    }
    return null;
}

This is suggested replaced with the following using LINQ:

private Ninja FindNinjaById(int ninjaId)
{
    return Ninjas.FirstOrDefault(ninja => ninja.Id == ninjaId);
}

This looks all fine, and I'm sure it's no problem regarding performance to replace this one foreach. But is it something I should do in general? Or might I run into performance problems with all these LINQ queries everywhere?

+6  A: 

Profile


The only way to know for sure is to profile. Yes, certain queries can be slower. But when you look at what ReSharper has replaced here, it's essentially the same thing, done in a different manner. The ninjas are looped, each Id is checked. If anything, you could argue this refactoring comes down to readability. Which of the two do you find easier to read?

Larger data sets will have a bigger impact sure, but as I've said, profile. It's the only way to be sure if such enhancements have a negative effect.

Finglas
Profiling is a useful tool, but it's not magic. Sometimes profiling only tells you how good your data set is, not how the algorithm will respond to an arbitrary data set. There are many algorithms that are heavily dependent on the data set for their performance. Bubble sort on a sorted set is pretty fast, for example.
tvanfosson
True, but given the same dataset you at least have some figures to back up claims. Any talk of "Which is faster" etc.. without some form of profiling/optimisation details is madness.
Finglas
Thx. If I suspect some concrete problems I will profile it, but for now I don't really see any problems. Just wondering if there are some general best practices on this.
stiank81
@Finglas - that's what I used to tell them in my algorithms class. Why bother with all this O-notation stuff when I can just test it and see which one is faster. ;-) Seriously, I think you can tell a lot by simply analyzing the code to see which class the algorithm falls into. If you can do it in nlogn with your code and LINQ would take n**2, then you don't really need to profile it to tell which is faster in the general case. Of course, if n is very small it may not matter and you go with the most readable solution.
tvanfosson
+1  A: 

The above does the exact same thing.

As long as you use your LINQ queries correctly you will not suffer from performance issues. If you use it correctly it is more likely to be faster due to the skill of the people creating LINQ.

The only thing you can benefit of creating your own is if you want full control or LINQ does not offer what you need or you want a better ability to debug.

Oskar Kjellin
+17  A: 

You need to understand what the LINQ query is going to do "under the hood" and compare that to running your code before you can know whether you should change it. Generally, I don't mean that you need to know the exact code that will be generated, but you do need to know the basic idea of how it would go about performing the operation. In your example, I would surmise that LINQ would basically work about the same as your code and because the LINQ statement is more compact and descriptive, I would prefer it. There are times, though, when LINQ may not be the ideal choice, though probably not many. Generally I would think that just about any looping construct would be replaceable by an equivalent LINQ construct.

tvanfosson
+1 nice answer.
James
One scneario where pure Linq cannot replace for-each is when you need to do some action on each of the objects that is iterated over. In this scenario I will usually create a Linq query that covers whatever my "filter"logic is, and then have a much simpler for-each loop that performs the action on each result.
Chris
@Chris - agreed, though you could consider simply writing an extension to perform the action, too. Generally, though, since LINQ is *query*, I would prefer it to not have side-effects.
tvanfosson
+1  A: 

The cool thing about LINQ queries is that it makes it dead simple to convert to a parallel query. Depending on what you're doing, it may or may not be faster (as always, profile), but it's pretty neat, nonetheless.

Dean Harding
+4  A: 

We've built massive apps, with LINQ sprinkled liberally throughout. It's never, ever slowed us down.

It's perfectly possible to write LINQ queries that will be very slow, but it's easier to fix simple LINQ statements than enormous for/if/for/return algorithms.

Take resharper's advice :)

Rob Fonseca-Ensor
ReSharper has given me tons of good advice this far, so I tend to end up taking their advice :-)
stiank81
+7  A: 

Let me start by saying that I love LINQ for its expressiveness and use it all the time without any problem.

There are however some differences in performance. Normally they are small enough to ignore, but in the critical path of your application, there might be times you want to optimize them away.

Here is the set of differences that you should be aware of, that could matter with performance:

  • LINQ uses delegate calls excessively, and delegate invocations are slower than method invocations and of course slower than inline code.
  • A delegate is a method pointer inside an object. That object need to be created.
  • LINQ operators usually return a new object (an iterator) that allows looping through the collection. Changed LINQ operators thus create multiple new objects.
  • When your inner loop uses objects from outside (called closures) they have to be wrapped in objects as well (which need to be created).
  • Many LINQ operators call the GetEnumerator method on an collection to iterate it. Calling GetEnumerator usually ensures the creation of yet another object.
  • Iterating the collection is done using the IEnumerator interface. Interface calls are a bit slower than normal method calls.
  • IEnumerator objects often need to be disposed or at least, Dispose has to be called.

When performance is a concern, also try using for over foreach.

Again, I love LINQ and I can't remember ever decided not to use a LINQ (to objects) query because of performance. So, don't do any premature optimizations. Start with the most readability solution first, than optimize when needed.

Steven
Thx. Very useful!
stiank81
"...delegate invocations are slower than method invocations..." - only by an insignificant amount. You can't even see a difference when profiling.
Callum Rogers
@steven, using for over .ForEach is not recommended when dealing with collections -- there's some special optimizations which will make .ForEach() significantly faster than using the for statement with collections. On the other hand, you *should* use the for statement with arrays.
code4life
http://diditwith.net/2006/10/05/PerformanceOfForeachVsListForEach.aspx
code4life
@Code4life: I never actually talked about the `List<T>.ForEach()` method. Have you looked in Reflector why it is faster than a `foreach` statement? I bet it is doing a `for` internally on its array. Of course it beats the `foreach` on that I agree that using for is not always faster, not even always possible, because not all collections (or sequences actually) have an indexer.
Steven
@Steven, sorry for the misunderstanding. However, the .ForEach() is specifically optimized for collections, it seems. Based on the tests recorded at this site: http://diditwith.net/2006/10/05/PerformanceOfForeachVsListForEach.aspxit outperformed *for* and *foreach* statements by a significant margin. With compiler optimizations turned on, the *for* statement was marginally faster than ForEach.
code4life
@Code4life: Note that this `ForEach()` method is an instance method of `List<T>`. Therefore, it is not really optimized for 'collections'; it is optimized for `List<T>`. But yes indead, it is pretty fast :-) Cheers.
Steven
+3  A: 

One thing we identified to be performance problematic is creating lots of lambdas and iterating over small collections. What happens in the converted sample?

Ninjas.FirstOrDefault(ninja => ninja.Id == ninjaId)

First, new instance of (generated) closure type is created. New instance in managed heap, some work for GC. Second, new delegate instance is created from method in that closure. Then method FirstOrDefault is called. What it does? It iterates collection (same as your original code) and calls delegate.

So basically, you have 4 things added here: 1. Create closure 2. Create delegate 3. Call through delegate 4. Collect closure and delegate

If you call FindNinjaById lots of times, you will add this to may be important perforamnce hit. Of course, measure it.

If you replace it with (equivalent)

Ninjas.Where(ninja => ninja.Id == ninjaId).FirstOrDefault()

it adds 5. Creating state machine for iterator ("Where" is yielding function)

Ilya Ryzhenkov
Thx, that's clarifying! But are you suggesting I should change to the equivalent using Where? Sounds less efficient if this adds a fifth point. Did I misunderstand?
stiank81
He is saying that using Where adds 5 additional steps, rather than the 4 your example adds. So no, do not use the Where.
emddudley
That's what it sounded like. Thanks for clarifying this...
stiank81
This will very, very likely be an _insignificant_ performance hit. Those four steps have hardly any impact. Anyway, if performance is an issue it is very easy to Parallise code with LINQ, which should help _most_ of the time.
Callum Rogers
@callum, this is NOT insignificant hit, if this function is called millions of times in a row. Really, we did measure.
Ilya Ryzhenkov
+2  A: 

An anecdote: when I was just getting to know C# 3.0 and LINQ, I was still in my "when you have a hammer, everything looks like a nail" phase. As a school assignment, I was supposed to write a connect four/four in row game as an exercise in adversarial search algorithms. I used LINQ throughout the program. In one particular case, I needed to find the row a game-piece would land on if I dropped it in a particular column. Perfect use-case for a LINQ query! This turned out to be really slow. However, LINQ wasn't the problem, the problem was that I was searching to begin with. I optimized this by just keeping a look-up table: an integer array containing the row number for every column of the game-board, updating that table when inserting a game-piece. Needless to say, this was much, much faster.

Lesson learned: optimize your algorithm first, and high level constructs like LINQ might actually make that easier.

That said, there is a definite cost to creating all those delegates. On the other hand, there can also be a performance benefit by utilizing LINQ's lazy nature. If you manually loop over a collection, you're pretty much forced to create intermediate List<>'s whereas with LINQ, you basically stream the results.

JulianR
+1  A: 

To add my own experience of using LINQ where performance really does matter - with Monotouch - the difference there is still insignificant.

You're 'handicapped' on the 3GS iPhone to around 46mb of ram and a 620mhz ARM processor. Admittedly the code is AOT compiled but even on the simulator where it is JIT'd and going through a long series of indirection the difference is tenths of a millisecond for sets of 1000s of objects.

Along with Windows Mobile this is where you have to worry about the performance costs - not in huge ASP.NET applications that are running on quad-core 8gb servers, or desktops with dual scores. One exception to this would be with large object sets, although arguably you would lazy load anyway, and the initial query task would be performed on the database server.

It's a bit of a cliché on Stackoverflow, but use the shorter more readable code until 100s of milliseconds really do matter.

Chris S