tags:

views:

206

answers:

6

While exploring a recent Linq question i noticed that the algorithm seemed rather slow. Digging deeper i noticed that it wasn't the linq code but the output of the result afterwards that took the long time. (Kudos to Marc Gravel btw for kicking the slickest Linq i've seen yet.)

Code:

        DateTime dt = DateTime.Now;
        Console.WriteLine("Alg Start " + dt.Second + "." + dt.Millisecond);

        var qry = from l in Enumerable.Range(100000, 999999)
                  let s = l.ToString()
                  let sReversed = new string(s.Reverse().ToArray())
                  from i in Enumerable.Range(3, 9)
                  let t = (l * i).ToString()
                  where t == sReversed
                  select new { l, i };

        dt = DateTime.Now;
        Console.WriteLine("Alg End " + dt.Second + "." + dt.Millisecond);

        foreach (var row in qry)
            Console.WriteLine("{0} x {1} = {2}", row.l, row.i, row.l * row.i);

        dt = DateTime.Now;
        Console.WriteLine("Disp End " + dt.Second + "." + dt.Millisecond);


Output:

Alg Start 20.257
Alg End   20.270
109989 x 9 = 989901
219978 x 4 = 879912
1099989 x 9 = 9899901
Disp End  31.322

.13 seconds to calculate and more than 11 to display?!? What's the reason for this?

+5  A: 

The reason the linq query appears to execute quickly is because nothing is actually been calculated at the point of definition since linq uses deferred execution i.e. no 'real' work is done until you start to enumerate the results.

Lee
+3  A: 

The reason is that the query isn't actually run until you enumerate it. In LINQ to objects it just sets up a bunch of delegates that get called when you iterate over the enumerator. If you were to add a ToList() to the query to materialize it you'd see the time taken would shift to the set up and away from the display.

tvanfosson
+3  A: 

with many linq providers, the "alg start" to "alt end" is just parsed - the actual expressions aren't evaluated until you actually start enumerating the result. So the actual creation of the "qry" variable is fast (just setting up a enumerable that will actually perform the logic in the query), but enumerating through it is slower.

Philip Rieck
+3  A: 

The LINQ code only creates a query object out of the query expression, which doesn't take a lot of time. Only in the foreach is the query actually executed.

By the way, you shouldn't use DateTime.Now for performance timing, but the Stopwatch class, as it's far more accurate.

Joren
+3  A: 

The query doesn't actually calculate until you iterate over it. Until then it is just like a SQL statement, waiting to be executed.

Yuriy Faktorovich
+2  A: 

That question is doing brute forcing; LINQ is actually quite handy in such cases - I discussed this here: Brute force (but lazily)

Just to expand on some of the previous answers:

LINQ is typically designed around deferred execution, meaning nothing happens until start iterating the result. This is typically done via an iterator block; consider the difference between these:

static IEnumerable<T> Where(this IEnumerable<T> data, Func<T,bool> predicate) {
    foreach(T item in data) {
        if(predicate(item)) yield return item;
    }
}

and:

static IEnumerable<T> Where(this IEnumerable<T> data, Func<T,bool> predicate) {
    var list = new List<T>();
    foreach(T item in data) {
        if(predicate(item)) list.Add(item);
    }
    return list;
}

The difference is that the second version does all the work when you call Where, returning a single result, where-as the second (via the magic of iterator blocks) only does work when the enumerator calls MoveNext(). Iterator blocks are discussed more in the free sample chapter 6 of C# in Depth.

Generally, the advantage of this is that it makes queries composable - especially important for database-based queries, but just as valid for regular work.

Note that even with iterator blocks, there is a second consideration; buffering. Consider Reverse() - no matter how you do it, to reverse a sequence, first you need to find the end of the sequence. Now consider that not all sequences end! Contrast this to Where, Skip, Take etc - which can filter rows without buffering (simply by dropping items).

A good example of using this in an infinite sequence is this Fibonacci question, where we can use the non-buffered, deferred approach:

    foreach (long i in Fibonacci().Take(10)) {
        Console.WriteLine(i);
    }

Without deferred execution, this would never complete.

Marc Gravell