tags:

views:

508

answers:

4

I have been testing out the yield return statement with some of the code I have been writing. I have two methods:

public static IEnumerable<String> MyYieldCollection {
        get 
        {
            wrapper.RunCommand("Fetch First From Water_Mains");
            for (int row = 0; row < tabinfo.GetNumberOfRows() ; row++) //GetNumberOfRows
                                                                      //will return 1000+ most of the time.
            {
                yield return wrapper.Evaluate("Water_Mains.col1");
                wrapper.RunCommand("Fetch Next From Water_Mains");
             }
        }
    }

and

public static List<String> MyListCollection
    {
        get
        {
            List<String> innerlist = new List<String>();

            wrapper.RunCommand("Fetch First From Water_Mains");
            for (int row = 0; row < tabinfo.GetNumberOfRows(); row++)
            {
                innerlist.Add(wrapper.Evaluate("Water_Mains.col1"));
                wrapper.RunCommand("Fetch Next From Water_Mains");
            }
            return innerlist;
        }
    }

then I use a foreach loop over each collection:

        foreach (var item in MyYieldCollection) //Same thing for MyListCollection.
        {
            Console.WriteLine(item);
        }

The funny thing is for some reason I seem to be able to loop over and print out the full MyListCollection faster then the MyYieldCollection.

Results:

  • MyYieldCollection -> 2062
  • MyListCollection -> 1847

I can't really see a reason for this, am I missing something or is this normal?

A: 

As far as I understand it, "yield return" will keep looping until it runs our of stuff to do and the function/property exits, returning a filled IEnumarable. In other words instead of the function being called for each item in the foreach loop, it is called once and before anything inside the foreach loop is executed.

It could be by the type of collections that are returned. Perhaps the List can be iterated over faster than whatever datastructure the IEnumerable is.

Robert Wagner
No, you don't understand yield return properly. Read chapter 6 of C# in Depth, freely available from http://manning.com/skeet.
Jon Skeet
+1  A: 

What happens if one iteration of your loop is expensive and you only need to iterate over a few items in your collection?

With yield you only need to pay for what you get ;)

public IEnumerable<int> YieldInts()
{
    for (int i = 0; i < 1000; i++)
    {
     Thread.Sleep(1000) // or do some other work
     yield return i;
    }
}

public void Main()
{
    foreach(int i in YieldInts())
    {
     Console.WriteLine(i);
     if(i == 42)
     {
      break;
     }
    }
}
iik
A: 

My guess is that the JIT can better optimize the for loop in the version that returns the list. In the version that returns IEnumerable, the row variable used in the for loop is now actually a member of a generated class instead of a variable that is local only to the method.

The speed difference is only around 10%, so unless this is performance critical code I wouldn't worry about it.

Daniel Plaisted
+4  A: 

How have you done your timings? Are you in the debugger? In debug mode? It looks like you are using DataTable, so I used your code as the template for a test rig (creating 1000 rows each time), and used the harness as below, in release mode at the command line; the results were as follows (the number in brackets is a check to see they both did the same work):

Yield: 2000 (5000000)
List: 2100 (5000000)

Test harness:

static  void Main()
{
    GC.Collect(GC.MaxGeneration,GCCollectionMode.Forced);
    int count1 = 0;
    var watch1 = Stopwatch.StartNew();        
    for(int i = 0 ; i < 5000 ; i++) {
        foreach (var row in MyYieldCollection)
        {
            count1++;
        }
    }
    watch1.Stop();

    GC.Collect(GC.MaxGeneration,GCCollectionMode.Forced);
    int count2 = 0;
    var watch2 = Stopwatch.StartNew();
    for (int i = 0; i < 5000; i++)
    {
        foreach (var row in MyListCollection)
        {
            count2++;
        }
    }
    watch1.Stop();

    Console.WriteLine("Yield: {0} ({1})", watch1.ElapsedMilliseconds, count1);
    Console.WriteLine("List: {0} ({1})", watch2.ElapsedMilliseconds, count2);
}

(note you shouldn't normally use GC.Collect, but it has uses for levelling the field for performance tests)

The only other change I made was to the for loop, to avoid repetition:

        int rows = tabinfo.Rows.Count;
        for (int row = 0; row < rows; row++) {...}

So I don't reproduce your numbers...

Marc Gravell