views:

969

answers:

11

Possible Duplicate:
Is there ever a reason to not use 'yield return' when returning an IEnumerable?

There are several useful questions here on SO about the benefits of yield return. For example,

There is also a question on if there is a reason not to use yield.

There are reasons not to use yield. For example, if I expect to return all items in a collection, it doesn't seem like yield would be necessary, right?

Beyond being simply 'unnecessary', what are the cases where use of yield will be specifically design-limiting, performance-limiting or cause an unexpected problem?

[This question has been re-phrased for clarity and to distinguish from a similar question.]

+1  A: 

When you don't want a code block to return an iterator for sequential access to an underlying collection, you dont need yield return. You simply return the collection then.

mumtaz
Think about returning it in a read-only wrapper. The caller might cast it back to the original collection type and modify it.
billpg
yeah ... thats right +1
mumtaz
+12  A: 

The key thing to realize is what yield is useful for, then you can decide which cases do not benefit from it.

In other words, when you do not need a sequence to be lazily evaluated you can skip the use of yield. When would that be? It would be when you do not mind immediately having your entire collection in memory. Otherwise, if you have a huge sequence that would negatively impact memory, you would want to use yield to work on it step by step (i.e., lazily). A profiler might come in handy when comparing both approaches.

Notice how most LINQ statements return an IEnumerable<T>. This allows us to continually string different LINQ operations together without negatively impacting performance at each step (aka deferred execution). The alternative picture would be putting a ToList() call in between each LINQ statement. This would cause each preceding LINQ statement to be immediately executed before performing the next (chained) LINQ statement, thereby forgoing any benefit of lazy evaluation and utilizing the IEnumerable<T> till needed.

Ahmad Mageed
+13  A: 

What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?

I can think of a couple of cases, IE:

  • Avoid using yield return when you return an existing iterator. Example:

    // Don't do this, it creates overhead for no reason
    // (a new state machine needs to be generated)
    public IEnumerable<string> GetKeys() 
    {
        foreach(string key in _someDictionary.Keys)
            yield return key;
    }
    // DO this
    public IEnumerable<string> GetKeys() 
    {
        return _someDictionary.Keys;
    }
    
  • Avoid using yield return when you don't want to deffer execution code for the method. Example:

    // Don't do this, the exception won't get thrown until the iterator is
    // iterated, which can be very far away from this method invocation
    public IEnumerable<string> Foo(Bar baz) 
    {
        if (baz == null)
            throw new ArgumentNullException();
         yield ...
    }
    // DO this
    public IEnumerable<string> Foo(Bar baz) 
    {
         return new BazIterator(baz);
    }
    
Pop Catalin
+1 for deferred execution = deferred exception if the code throws.
Davy8
+4  A: 

Yield would be limiting/unnecessary when you need random access. If you need to access element 0 then element 99, you've pretty much eliminated the usefulness of lazy evaluation.

Robert Gowland
When you need random access, IEnumerable can't help you. How would you access element 0 or 99 of an IEnumerable? Guess I don't see what you're trying to say
qstarin
@qstarin, exactly! The only way to access element 99 is to go through elements 0-98, so lazy evaluation has gained you nothing unless you only needed item 99 out of 2 billion. I'm not saying that you can access `enumberable[99]` I'm saying that if you were only interested in the 99th element, enumerable is not the way to go.
Robert Gowland
@Robert: that has nothing at all to do with yield. It is inherent to IEnumerator, whether it is implemented using iterator blocks or not.
qstarin
@qstarin, it does have *something* to do with yield since yield will result in an enumerator. The OP asked when to avoid yield, yield results in an enumerator, using an enumerator for random access is unwieldy, therefore using yield when random access is required is a bad idea. The fact that he could have generated an enumerable in a different way doesn't negate the fact that using yield isn't good. You could shoot a man with a gun, or you could hit a man with a bat... the fact that you can kill a man with a bat doesn't negate that you shouldn't have shot him.
Robert Gowland
@qstarin, however, you are right to point out that there are other ways to generate IEnumerator.
Robert Gowland
A: 

If you're defining a Linq-y extension method where you're wrapping actual Linq members, those members will more often than not return an iterator. Yielding through that iterator yourself is unnecessary.

Beyond that, you can't really get into much trouble using yield to define a "streaming" enumerable that is evaluated on a JIT basis.

KeithS
+2  A: 

One that might catch you out is if you are serialising the results of an enumeration and sending them over the wire. Because the execution is deferred until the results are needed, you will serialise an empty enumeration and send that back instead of the results you want.

Aidan
+32  A: 

What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?

It's a good idea to think carefully about your use of "yield return" when dealing with recursively defined structures. For example, I often see this:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
    if (root == null) yield break;
    yield return root.Value;
    foreach(T item in PreorderTraversal(root.Left))
        yield return item;
    foreach(T item in PreorderTraversal(root.Right))
        yield return item;
}

Perfectly sensible-looking code, but it has performance problems. Suppose the tree is h deep. Then there will at most points be O(h) nested iterators built. Calling "MoveNext" on the outer iterator will then make O(h) nested calls to MoveNext. Since it does this O(n) times for a tree with n items, that makes the algorithm O(hn). And since the height of a binary tree is lg n <= h <= n, that means that the algorithm is at best O(n lg n) and at worst O(n^2) in time, and best case O(lg n) and worse case O(n) in stack space. It is O(h) in heap space because each enumerator is allocated on the heap. (On implementations of C# I'm aware of; a conforming implementation might have other stack or heap space characteristics.)

But iterating a tree can be O(n) in time and O(1) in stack space. You can write this instead like:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
    var stack = new Stack<Tree<T>>();
    stack.Push(root);
    while (stack.Count != 0)
    {
        var current = stack.Pop();
        if (current == null) continue;
        yield return current.Value;
        stack.Push(current.Left);
        stack.Push(current.Right);
    }
}

which still uses yield return, but is much smarter about it. Now we are O(n) in time and O(h) in heap space, and O(1) in stack space.

Further reading: see Wes Dyer's article on the subject:

http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx

Eric Lippert
About the first algo: You said it's O(1) in heapspace. Shouldn't it be O(h) in heapspace? (and O(n) in allocated objects over time)
CodeInChaos
@CodeInChaos: It's O(h) in heap space, yes, that was a typo. (Because the enumerators are allocated on the heap.)
Eric Lippert
Oh dear (see what i did there) ... my brain fuzzed over on the second time O(n) was typed. I'm just too blond to understand all this `Order` stuff ... #fml
Pure.Krome
I keep hoping to hear about a `yield foreach` in the next version of C#...
Gabe
Stephen Toub has an article ( http://blogs.msdn.com/b/toub/archive/2004/10/29/249858.aspx ) discussing this specific example, as well as a Towers of Hanoi puzzle solver that uses both methods of iteration in order to demonstrate the performance difference.
Brian
A: 

I have to maintain a pile of code from a guy who was absolutely obsessed with yield return and IEnumerable. The problem is that a lot of third party APIs we use, as well as a lot of our own code, depend on Lists or Arrays. So I end up having to do:

IEnumerable<foo> myFoos = getSomeFoos();
List<foo> fooList = new List<foo>(myFoos);
thirdPartyApi.DoStuffWithArray(fooList.ToArray());

Not necessarily bad, but kind of annoying to deal with, and on a few occasions it's led to creating duplicate Lists in memory to avoid refactoring everything.

Mike Ruhlin
`myFoos.ToArray()` should suffice.
Ahmad Mageed
"myFoos.ToArray() should suffice" ... if you are using .NET 3.5 or later.
Joe
@Joe good point!
Ahmad Mageed
Good point to both of you. Got used to doing it the old way. We're using 3.5 for most stuff now.
Mike Ruhlin
+2  A: 

Eric Lippert raises a good point (too bad C# doesn't have stream flattening like Cw). I would add that sometimes the enumeration process is expensive for other reasons, and therefore you should use a list if you intend to iterate over the IEnumerable more than once.

For example, LINQ-to-objects is built on "yield return". If you've written a slow LINQ query (e.g. that filters a large list into a small list, or that does sorting and grouping), it may be wise to call ToList() on the result of the query in order to avoid enumerating multiple times (which actually executes the query multiple times).

If you are choosing between "yield return" and List<T> when writing a method, consider: is this expensive, and will the caller need to enumerate the results more than once? If you know the answer is yes, then don't use "yield return" unless the List produced is extremely large (and you can't afford the memory it would use - remember, another benefit of yield is that the result list doesn't have to be entirely in memory at once).

Another reason not to use "yield return" is if interleaving operations is dangerous. For example, if your method looks something like this,

IEnumerable<T> GetMyStuff() {
    foreach (var x in MyCollection)
        if (...)
            yield return (...);
}

this is dangerous if there is a chance that MyCollection will change because of something the caller does:

foreach(T x in GetMyStuff()) {
    if (...)
        MyCollection.Add(...);
        // Oops, now GetMyStuff() will throw an exception because
        // MyCollection was modified.
}

yield return can cause trouble whenever the caller changes something that the yielding function assumes does not change.

Qwertie
+2  A: 

There are a lot of excellent answers here. I would add this one: Don't use yield return for small or empty collections where you already know the values:

IEnumerable<UserRight> GetSuperUserRights() {
    if(SuperUsersAllowed) {
        yield return UserRight.Add;
        yield return UserRight.Edit;
        yield return UserRight.Remove;
    }
}

In these cases the creation of the Enumerator object is far more expensive than just generating a data structure.

IEnumerable<UserRight> GetSuperUserRights() {
    return SuperUsersAllowed
           ? new[] {UserRight.Add, UserRight.Edit, UserRight.Remove}
           : Enumerable.Empty<UserRight>();
}
StriplingWarrior
A: 

I would avoid using yield return if the method has a side effect that you expect on calling the method. This is due to the deferred execution that Pop Catalin mentions.

One side effect could be modifying the system, which could happen in a method like IEnumerable<Foo> SetAllFoosToCompleteAndGetAllFoos(), which breaks the single responsibility principle. That's pretty obvious (now...), but a not so obvious side effect could be setting a cached result or similar as an optimisation.

My rules of thumb (again, now...) are:

  • Only use yield if the object being returned requires a bit of processing
  • No side effects in the method if I need to use yield
  • If have to have side effects (and limiting that to caching etc), don't use yield and make sure the benefits of expanding the iteration outweigh the costs
Ben Scott