views:

394

answers:

6

I always assumed that if I was using Select(x=> ...) in the context of LINQ to objects, then the new collection would be immediately created and remain static. I'm not quite sure WHY I assumed this, and its a very bad assumption but I did. I often use .ToList() elsewhere, but often not in this case.

This code demonstrates that even a simple 'Select' is subject to deferred execution :

        var random = new Random();
        var animals = new[] { "cat", "dog", "mouse" };
        var randomNumberOfAnimals = animals.Select(x => Math.Floor(random.NextDouble() * 100) + " " + x + "s");

        foreach (var i in randomNumberOfAnimals)
        {
            testContextInstance.WriteLine("There are " + i);

        }

        foreach (var i in randomNumberOfAnimals)
        {
            testContextInstance.WriteLine("And now, there are " + i);

        }

This outputs the following (the random function is called every time the collection is iterated through):

There are 75 cats
There are 28 dogs
There are 62 mouses
And now, there are 78 cats
And now, there are 69 dogs
And now, there are 43 mouses

I have many places where I have an IEnumerable<T> as a member of a class. Often the results of a LINQ query are assigned to such an IEnumerable<T>. Normally for me this does not cause issues, but I have recently found a few places in my code where it poses more than just a performance issue.

In trying to check for places where I had made this mistake I thought I could check to see if a particular IEnumerable<T> was of type IQueryable. This i thought would tell me if the collection was 'deferred' or not. It turns out that the enumerator created by the Select operator above is of type System.Linq.Enumerable+WhereSelectArrayIterator``[System.String,System.String] and not IQueryable.

I used Reflector to see what this interface inherited from, and it turns out not to inherit from anything that indicates it is 'LINQ' at all - so there is no way to test based upon the collection type.

I'm quite happily now putting .ToArray() everywhere now, but I'd like to have a mechanism to make sure this problem doesnt happen in future. Visual Studio seems to know how to do it because it gives a message about 'expanding the results view will evaluate the collection.'

The best I have come up with is :

        bool deferred = !object.ReferenceEquals(randomNumberOfAnimals.First(),
                                                randomNumberOfAnimals.First());

Edit: This only works if a new object is created with 'Select' and it not a generic solution. I'm not recommended it in any case though! It was a little tongue in cheek of a solution.

A: 

The message about expanding the results view will evaluate the collection is a standard message presented for all IEnumerable objects. I'm not sure that there is any foolproof means of checking if an IEnumerable is deferred, mainly because even a yield is deferred. The only means of absolutely ensuring that it isn't deferred is to accept an ICollection or IList<T>.

Adam Robinson
Even an `IList<T>` could be deferred if it has a virtual `Item` getter.
280Z28
Yeah, what the confused Chevy/Nissan crossbreed said. If you want to be sure, only accept a concrete array.
Jeffrey Hantin
@Jeffrey: powered by Chevy: http://www.280z28.org/images/z/IMG_2020.jpg
280Z28
ICollection seems to work best. while i was aware of it - i tend to forget that one. i didnt want a List or IList becasue of the overhead, but ICollection tells me if I forget to use ToArray() [ or someone using my software does ]
Simon_Weaver
+1  A: 

Why do you care about deferred execution? You should pay no attention to that, and consider it to be a private implementation detail of your IEnumerable<T>.

John Saunders
because I get different objects back each time and i was not expecting that. i was modifying a quantity on them and then that change was lost
Simon_Weaver
What's that have to do with deferred execution? I think you have mistaken the source of your problem.
John Saunders
A: 

It's absolutely possible to manually implement a lazy IEnumerator<T>, so there's no "perfectly general" way of doing it. What I keep in mind is this: if I'm changing things in a list while enumerating something related to it, always call ToArray() before the foreach.

280Z28
+3  A: 

In general, I'd say you should try to avoid worrying about whether it's deffered.

There are advantages to the streaming execution nature of IEnumerable<T>. It is true - there are times that it's disadvantageous, but I'd recommend just always handling those (rare) times specifically - either go ToList() or ToArray() to convert it to a list or array as appropriate.

The rest of the time, it's better to just let it be deferred. Needing to frequently check this seems like a bigger design problem...

Reed Copsey
agreed. the question was more sparked by curiosity to know how to do this, and also for finding places where this issue occured. i definitely want to pretty much always do 'ToArray()'. i had just been getting away without it becasue a) it is slightly ugly to look at and b) i thought that a 'Select' projection was immediately realized.
Simon_Weaver
+4  A: 

Deferred execution of LINQ has trapped a lot of people, you're not alone.

The approach I've taken to avoiding this problem is as follows:

Parameters to methods - use IEnumerable<T> unless there's a need for a more specific interface.

Local variables - usually at the point where I create the LINQ, so I'll know whether lazy evaluation is possible.

Class members - never use IEnumerable<T>, always use List<T>. And always make them private.

Properties - use IEnumerable<T>, and convert for storage in the setter.

public IEnumerable<Person> People 
{
    get { return people; }
    set { people = value.ToList(); }
}
private List<People> people;

While there are theoretical cases where this approach wouldn't work, I've not run into one yet, and I've been enthusiasticly using the LINQ extension methods since late Beta.

BTW: I'm curious why you use ToArray(); instead of ToList(); - to me, lists have a much nicer API, and there's (almost) no performance cost.

Update: A couple of commenters have rightly pointed out that arrays have a theoretical performance advantage, so I've amended my statement above to "... there's (almost) no performance cost."

Update 2: I wrote some code to do some micro-benchmarking of the difference in performance between Arrays and Lists. On my laptop, and in my specific benchmark, the difference is around 5ns (that's nanoseconds) per access. I guess there are cases where saving 5ns per loop would be worthwhile ... but I've never come across one. I had to hike my test up to 100 million iterations before the runtime became long enough to accurately measure.

Bevan
Bevan: I agree with most of your point - but the last sentence. There are performance costs to lists, they just are very, very minimal. I agree, in nearly all cases, it's negligible, but List<T> IS (microscopically) slower than an array, so if you're doing high perf. code, there is a reason to use ToArray() sometimes.
Reed Copsey
@Bevan: One tiny case where arrays are neat is the baseline JIT can get similar performance from them as the optimizing JIT can get from `List<T>`. There are direct IL instructions for working with the arrays - no separate method inlining required.
280Z28
any specific reason not to use ICollection<T> for class members, or IList<T>. I'm leaning towards ICollection for class members
Simon_Weaver
@Reed Copsey - Perhaps I over simplified. If a system has such a pervasive need for performance that every microsecond counts, then using an array can be advantageous, assuming that the array is accessed a lot. However ... in most of the code I've seen, arrays are used when performance is not that great an issue and when the nicer List API would avoid the need to jump through hoops.
Bevan
@Simon - the List<T> class has a rich and expressive API. I've found that having access through that API, rather than artificially limiting myself to IList<T> or ICollection<T> has been beneficial, especially when I can easily ensure that I always have a List<T> to work with.
Bevan
+1  A: 

This is an interesting reaction to deferred execution - most people view it as a positive in that it allows you to transform streams of data without needing to buffer everything up.

Your suggested test won't work, because there's no reason why an iterator method can't yield the same reference object instance as its first object on two successive tries.

IEnumerable<string> Names()
{
    yield return "Fred";
}

That will return the same static string object every time, as the only item in the sequence.

As you can't reliably detect the compiler-generated class that is returned from an iterator method, you'll have to do the opposite: check for a few well-known containers:

public static IEnumerable<T> ToNonDeferred(this IEnumerable<T> source)
{
    if (source is List<T> || source is T[]) // and any others you encounter
        return source;

    return source.ToArray();
}

By returning IEnumerable<T>, we keep the collection readonly, which is important because we may get back a copy or an original.

Daniel Earwicker
so is this how visual studio does it?
Simon_Weaver
Visual Studio shows the "expanding the results view will evaluate the collection" for _any_ enumerable that isn't `ICollection`.
Pavel Minaev
and i LOVE deferred execution. i'd just made a stupid assumption (and really not even a conscious one) that Select(x => x + "foo") would not be deferred. i absolutely understand why it is, but it had never resulted [surprisingly] in a bug in my code until today
Simon_Weaver