views:

1124

answers:

9

All of the examples I've seen of using yield return x; inside a C# method could be done in the same way by just returning the whole list. In those cases, is there any benefit or advantage in using the yield return syntax vs. returning the list?

Also, in what types of scenarios would yield return be used that you couldn't just return the complete list?

+26  A: 

But what if you were building a collection yourself?

In general, iterators can be used to lazily generate a sequence of objects. For example Enumerable.Range method does not have any kind of collection internally. It just generates the next number on demand. There are many uses to this lazy sequence generation using a state machine. Most of them are covered under functional programming concepts.

In my opinion, if you are looking at iterators just as a way to enumerate through a collection (it's just one of the simplest use cases), you're going the wrong way. As I said, iterators are means for returning sequences. The sequence might even be infinite. There would be no way to return a list with infinite length and use the first 100 items. It has to be lazy sometimes. Returning a collection is considerably different from returning a collection generator (which is what an iterator is). It's comparing apples to oranges.

Hypothetical example:

static IEnumerable<int> GetPrimeNumbers() {
   for (int num = 2; ; ++num) 
       if (IsPrime(num))
           yield return num;
}

static void Main() { 
   foreach (var i in GetPrimeNumbers()) 
       if (i < 10000)
           Console.WriteLine(i);
       else
           break;
}

This example prints prime numbers less than 10000. You can easily change it to print numbers less than a million without touching the prime number generation algorithm at all. In this example, you can't return a list of all prime numbers because the sequence is infinite and the consumer doesn't even know how many items it wants from the start.

Mehrdad Afshari
Ye olde Socratic method
Totty
Right. I've built the list, but what dif does it make to return one item at a time vs. returning the whole list?
Dennis Palmer
Among other reasons, it makes your code more modular so you can load an item, process, then repeat. Also, consider the case where loading an item is very expensive, or there are lots of them (millions say). In those cases, loading the entire list is undesirable.
Dana the Sane
@Dennis: For a linearly stored list in memory, it might not have a difference but if you were, for instance, enumerating a 10GB file and processing each line one by one, it would make a difference.
Mehrdad Afshari
+1 for an excellent answer - I would also add that the yield keyword allows iterator semantics to be applied to sources that are not traditionally considered collections - such as network sockets, web services, or even concurrency problems (see http://stackoverflow.com/questions/481714/ccr-yield-and-vb-net)
LBushkin
+8  A: 

In toy/demonstration scenarios, there isn't a lot of difference. But there are situations where yielding iterators are useful - sometimes, the entire list isn't available (e.g. streams), or the list is computationally expensive and unlikely to be needed in its entirety.

DDaviesBrackett
+1  A: 

If the entire list is gigantic, it might eat a lot of memory just to sit around, whereas with the yield you only play with what you need, when you need it, regardless of how many items there are.

nilamo
+1  A: 

Take a look at this discussion on Eric White's blog (excellent blog by the way) on lazy versus eager evaluation.

JP Alioto
+1  A: 

Using the yield return you can iterate over items without ever having to build a list. If you don't need the list, but want to iterate over some set of items it can be easier to write

foreach (var foo in GetSomeFoos()) {
    operate on foo
}

Than

foreach (var foo in AllFoos) {
    if (some case where we do want to operate on foo) {
        operate on foo
    } else if (another case) {
        operate on foo
    }
}

You can put all of the logic for determining whether or not you want to operate on foo inside your method using yield returns and you foreach loop can be much more concise.

AgileJon
+4  A: 

Lazy Evaluation/Deferred Execution

The "yield return" iterator blocks won't execute any of the code until you actually call for that specific result. This means they can also be chained together efficiently. Pop quiz: assuming the "ReadLines()" function is reads all the lines from a text file and is implemented using an iterator block, how many times will the following code iterate over the file?

var query = ReadLines(@"C:\MyFile.txt")
                            .Where(l => l.Contains("search text") )
                            .Select(l => int.Parse(l.SubStrin(5,8))
                            .Where(i => i > 10 );

int sum=0;
foreach (int value in query) 
{
    sum += value;
}

The answer is exactly one, and that not until way down in the foreach loop.

Separation of Concerns

Again using the hypothetical ReadLines() function from above, we can now easily separate the code that reads the file from the code that filters out un-needed lines from the code that actually parses the results. That first one, especially, is very re-usable.

Infinite Lists

See my answer to this question for a good example:
http://stackoverflow.com/questions/1076001/need-help-with-c-fibonacci

Basically, I implement the fibonacci sequence using an iterator block that will never stop (at least, not before reaching MaxInt), and then use that implementation in a safe way.

Joel Coehoorn
Deferred execution is probably the biggest benefit of iterators.
justin.m.chase
+1  A: 

Here's my previous accepted answer to exactly the same question:

http://stackoverflow.com/questions/384392/yield-keyword-value-added/384404#384404

Another way to look at iterator methods is that they do the hard work of turning an algorithm "inside out". Consider a parser. It pulls text from a stream, looks for patterns in it and generates a high-level logical description of the content.

Now, I can make this easy for myself as a parser author by taking the SAX approach, in which I have a callback interface that I notify whenever I find the next piece of the pattern. So in the case of SAX, each time I find the start of an element, I call the beginElement method, and so on.

But this creates trouble for my users. They have to implement the handler interface and so they have to write a state machine class that responds to the callback methods. This is hard to get right, so the easiest thing to do is use a stock implementation that builds a DOM tree, and then they will have the convenience of being able to walk the tree. But then the whole structure gets buffered up in memory - not good.

But how about instead I write my parser as an iterator method?

IEnumerable<LanguageElement> Parse(Stream stream)
{
    // imperative code that pulls from the stream and occasionally 
    // does things like:

    yield return new BeginStatement("if");

    // and so on...
}

That will be no harder to write than the callback-interface approach - just yield return an object derived from my LanguageElement base class instead of calling a callback method.

The user can now use foreach to loop through my parser's output, so they get a very convenient imperative programming interface.

The result is that both sides of a custom API look like they're in control, and hence are easier to write and understand.

Daniel Earwicker
+2  A: 

The fine answers here suggest that a benefit of yield return is that you don't need to create a list; Lists can be expensive. (Also, after a while, you'll find them bulky and inelegant.)

But what if you don't have a List?

yield return allows you to traverse data structures (not necessarily Lists) in a number of ways. For example, if your object is a Tree, you can traverse the nodes in pre- or post- order without creating other lists or changing the underlying data structure.

public IEnumerable<T> InOrder()
{
    foreach (T k in kids)
        foreach (T n in k.InOrder())
            yield return n;
    yield return (T) this;
}

public IEnumerable<T> PreOrder()
{
    yield return (T) this;
    foreach (T k in kids)
        foreach (T n in k.PreOrder())
            yield return n;
}
Ray
+1 for a great additional example. Thanks!
Dennis Palmer
This example also highlights the case of delegation. If you have a collection that under certain circumstances could contain the items of other collections, it's very simple to iterate and use yield return instead of building a full list of all results and returning that.
Thomas G. Mayfield
+1  A: 

Sometimes the sequences you need to return are just too large to fit in the memory. For example, about 3 months ago I took part in a project for data migration between MS SLQ databases. Data was exported in XML format. Yield return turned out to be quite useful with XmlReader. It made programming quite easier. For example, suppose a file had 1000 Customer elements - if you just read this file into memory, this will require to store all of them in memory at the same time, even if they are handled sequentially. So, you can use iterators in order to traverse the collection one by one. In that case you have to spend just memory for one element.

As it turned out, using XmlReader for our project was the only way to make the application work - it worked for a long time, but at least it did not hang the entire system and did not raise OutOfMemoryException. Of course, you can work with XmlReader without yield iterators. But iterators made my life much easier (I would not write the code for import so quickly and without troubles). Watch this page in order to see, how yield iterators are used for solving real problems (not just scientific with infinite sequences).

SPIRiT_1984