views:

156

answers:

1

Suppose we got a code like this:

IEnumerable<Foo> A = meh();
IEnumerable<Foo> B = meh();

var x =
    from a in A
    from b in B
    select new {a, b};

Let's also assume that meh returns an IEnumerable which performs a lot of expensive calculations when iterated over. Of course, we can simply cache the calculated results manually by means of

    IEnumerable<Foo> A = meh().ToList();

My question is if this manual caching of A and B is required, or if the above query caches the results of A and B itself during execution, so each line gets calculated only once. The difference of 2 * n and n * n calculations may be huge and I did not find a description of the behavior in MSDN, that's why I'm asking.

+3  A: 

Assuming you mean LINQ to Objects, it definitely doesn't do any caching - nor should it, IMO. You're building a query, not the results of a query, if you see what I mean. Apart from anything else, you might want to iterate through a sequence which is larger than can reasonably be held in memory (e.g. iterate over every line in a multi-gigabyte log file). I certainly wouldn't want LINQ to try to cache that for me!

If you want a query to be evaluated and buffered for later quick access, calling ToList() is probably your best approach.

Note that it would be perfectly valid for B to depend on A, which is another reason not to cache. For example:

var query = from file in Directory.GetFiles("*.log")
            from line in new LineReader(file)
            ...;

You really don't want LINQ to cache the results of reading the first log file and use the same results for every log file. I suppose it could be possible for the compiler to notice that the second from clause didn't depend on the range variable from the first one - but it could still depend on some side effect or other.

Jon Skeet
That makes a lot of sense, actually.
mafutrct
Goodo. Basically LINQ *generally* takes the "dumb but predictable" approach, which is a good thing IMO.
Jon Skeet