views:

134

answers:

2

Say I have a method like this (stolen from a previous SO answer by Jon Skeet):

public static IEnumerable<TSource> DuplicatesBy<TSource, TKey>
    (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    HashSet<TKey> seenKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        // Yield it if the key hasn't actually been added - i.e. it
        // was already in the set
        if (!seenKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}

In this method I have a HashSet that is used to hold keys that have been seen. If I use this method in something like this.

List<string> strings = new List<string> { "1", "1", "2", "3" };
List<string> somewhatUniques = strings.DuplicatesBy(s => s).Take(2);

This will only enumerate over the first 2 items in the strings list. But how does garbage collection collect the seenKeys hashset. Since yield just pauses the execution of the method, if the method is expensive how can I make sure I dispose of things properly?

+1  A: 

Well, garbage collection doesn't collect it right away. It can't, obviously.

Internally, when you do something like a foreach over your method, it's calling GetEnumerator() and then MoveNext() on it a lot of times to get each thing. Enumerators are disposable, and when the enumerator is disposed -- foreach disposes it for you at the end of the loop -- garbage collection will feel free to clean up any objects that are in your iterator.

So, if you have a lot of expensive state in your iterator and you're iterating over it for a long time, then you probably want to either not use yield return, or evaluate the whole enumeration right away by calling something like ToArray() and then looking at that.

EDIT: So, in response to your final question -- how you can make sure it gets disposed -- there's nothing special you need to do if you're using LINQ or foreach constructs on it, because they take care of it themselves via their usual magic. If you're manually getting the enumerator, make sure you call Dispose() on it when you're finished or put it in a using block.

mquander
I can't believe that the framework will allow the hashset to sit around until my appdomain closes. Not that my iterator will sit around for a long time, its a contrived example to ask the question.
Ray Booysen
Sorry, I might have been unclear. It doesn't let it sit around forever; it lets it sit around until the enumerator is gone.
mquander
+1  A: 

The compiler generates a hidden class to implement this code. It has a super-secret name: "d__0`2". Your seenKeys and source variables become fields of that class, ensuring that they can't get garbage collected unless the class object is collected.

The class implements the IEnumerator<> interface, the client code that uses the iterator uses that interface to call the MoveNext() method. It is that interface reference that keeps the class object alive. Which keeps its fields alive. As soon as the client code completes the foreach loop, the interface reference disappears, allowing the GC to clean everything up.

Use Ildasm.exe or Reflector to see this for yourself. It will give you some insight in the hidden cost of syntactic sugar as well. Iterators aren't cheap.

Hans Passant