views:

842

answers:

4

Consider the following C# code:

IEnumerable numbers = Enumerable.Range(0, 10);
var evens = from num in numbers where num % 2 == 0 select num;

Is this pure syntactic sugar to allow me to write a for or foreach loop as a one-liner? Are there any compiler optimizations under the covers that make the list comprehension above more efficient than the loop construct? How does this work under the hood?

+4  A: 

You can simplify the code further by

var evens = Enumerable.Range(0, 10).Where(n => n % 2 == 0);

One advantage of this form is that the execution of this expression is deferred until evens is iterated over (foreach(var n in evens) { ... }). The above statement merely tells the compiler to capture the idea of how to enumerate the even numbers between 0 and 10, but do not execute on that idea until absolutely necessary.

Jason
A: 

In your code above, you have a Linq query, which is looping over the IEnumerable in functionally the same way as a foreach loop might. However, in your code, there is a LOT going on under the hood. The foreach is probably much more efficient if you're intention is to write a high-performance loop. Linq is intended for a different purpose (generalized data access).

The IEnumerable interface exposes an iterator method, which is then called continuously by a loop construct, like a foreach or Linq query. The iterator returns the next item in the collection each time it is called.

Robert Harvey
By `for` loop I meant both `for` and `foreach` ... I edited the question to reflect that...
Jonas Gorauskas
Jonas, I have updated my answer to reflect your comments.
Robert Harvey
OK, but let's say that instead of doing something simple like the example above I have a scenario that's more like this: I have a List<Person> and I want to filter out all Person.LastName that matches a certain regex. I use the List Comprehension syntax to iterate over the List<Person> and I pass in the regex as a lambda expression. Would that be more efficient than writing a loop?
Jonas Gorauskas
It would certainly be more interesting. Whether it's faster is probably dependent on a number of things, like how many items are in the list. The regex might take awhile, so you could be bound by that. The only way to know for sure is to run some tests.
Robert Harvey
Jonas, similar principles apply for any LINQ query that uses regular C# objects. As I said below, the predicate function call for every element is going to be significant overhead.
Matthew Flaschen
Robert, he has to do a regex either way (for-loop/LINQ).
Matthew Flaschen
@Matthew: true, but could that be the bulk of the computational load?
Robert Harvey
The absolute cycles-per-element slowdown would probably be about the same. But since the base (for-loop) is already relatively time intensive (due to the regex, rather than just a mod), you may be right that the percent slowdown is lower. More profiling welcome.
Matthew Flaschen
+10  A: 

As Jason said, your code is equivalent to:

Enumerable.Range(0, 10).Where(n => n % 2 == 0);

Note the lambda will be transformed to a function call which is done for every element. This is probably the largest part of the overhead. I did a test, which indicates LINQ is about 3 times slower (mono gmcs version 1.2.6.0) on this exact task

    Time for 10000000 for loop reps: 00:00:17.6852560
    Time for 10000000 LINQ reps: 00:00:59.0574430

    Time for 1000000 for loop reps: 00:00:01.7671640
    Time for 1000000 LINQ reps: 00:00:05.8868350

EDIT: Gishu reports that VS2008 and framework v3.5 SP1 gives:

    Time for 1000000 loop reps: :00.3724585 
    Time for 1000000 LINQ reps: :00.5119530 

LINQ is about 1.4 times slower there.

It compares a for-loop and a list to LINQ (and whatever structure it uses internally). Either way, it converts the result to an array (necessary to force LINQ to stop being "lazy"). Both versions repeat:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

public class Evens
{
    private static readonly int[] numbers = new int[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};

    private static int MAX_REPS = 1000000;

    public static void Main()
    {
        Stopwatch watch = new Stopwatch();

        watch.Start();
        for(int reps = 0; reps < MAX_REPS; reps++)
        {
            List<int> list = new List<int>(); // This could be optimized with a default size, but we'll skip that.
            for(int i = 0; i < numbers.Length; i++)
            {
                int number = numbers[i];
                if(number % 2 == 0)
                    list.Add(number);
            }
            int[] evensArray = list.ToArray();
        }
        watch.Stop();
        Console.WriteLine("Time for {0} for loop reps: {1}", MAX_REPS, watch.Elapsed);

        watch.Reset();
        watch.Start();
        for(int reps = 0; reps < MAX_REPS; reps++)
        {
            var evens = from num in numbers where num % 2 == 0 select num;
            int[] evensArray = evens.ToArray();
        }
        watch.Stop();
        Console.WriteLine("Time for {0} LINQ reps: {1}", MAX_REPS, watch.Elapsed);
    }
}

Past performance tests on similar tasks (e.g. LINQ vs Loop - A performance test) corroborate this.

Matthew Flaschen
You're running this on Mono? Are you sure this is comparable to Microsoft IL?
Robert Harvey
Mono uses MSIL, which is also known as CIL following standardization.
Matthew Flaschen
Yes but that doesn't mean the two compilers are creating equivalent output.
Robert Harvey
With VS2008 and framework v3.5 SP1, Time for 1000000 loop reps: :00.3724585Time for 1000000 LINQ reps: :00.5119530
Gishu
Robert, I didn't say they created the same object code. That's precisely why I said I was using gmcs. I've added Gishu's results for comparison.
Matthew Flaschen
Ican also confirm the 1.4 slower times on my machine running VS2008 and .NET 3.5 SP1
Jonas Gorauskas
A: 

LINQ works differently for different types of data. You're feeding it objects, so it's using LINQ-to-objects. That gets translated into code similar to a straightforward for-loop.

But LINQ supports different types of data. For example, if you had a db table called 'numbers', LINQ-to-SQL would translate the same query;

var evens = from num in numbers where num % 2 == 0 select num;

into SQL like this;

select num from numbers where num % 2 = 0

and then executes that. Note that it doesn't do the filtering by creating a for-loop with an 'if' block inside; the 'where' clause modifies the query sent to the database server. The translation from query to executed code is specific to the type of data you're feeding it.

So no, LINQ isn't just syntactic sugar for for-loops, but a much more involved system for 'compiling' queries. For more info, search for Linq Provider. These are the components like linq-to-objects and linq-to-xml, and you can write your own if you're feeling brave.

Steve Cooper