views:

53

answers:

3

I have a question about how garbage collection might be handled in a linq query. Suppose I am given a list of requests to process. Each request generates a very large set of data, but then a filter is applied to only keep critical data from each requested load.

//Input data
List<request> requests;
IEnumerable<filteredData> results = requests.Select(request => Process(request)).Select(data => Filter(data));

So I know that the query is deferred for each data item until each filtered data item is requested, so thats good. But does that middle memory-intense part persist until the enumerable is completed?

What I am hoping happens is that each data element can be garbage collected as soon as it passes the filtered stage, thus making sure I have enough memory to process the whole list. Is this the case, or does the middle enumerable keep everything around until the entire query ends? If so, is there a linq way to deal with this?


note: the Process() function generates the memory intensive data... thats what I'm worried about

+2  A: 

As long as the return value of Process(...) and Filter(...) do not hold any references to the "large data items" used internally, then the memory used in that process should become unrooted and a candidate for GC after each element is processed.

This doesn't mean it will get collected, only that it will be a candidate. If memory pressure gets high, the GC will most likely collect it.

Reed Copsey
cool. my concern was the intermediate enumerable from the first select would hold the data until it was completed
tbischel
@tbischel: Nope - LINQ won't hold references, other than ones you add to a collection, etc. As long as there are no refs, it can be GCed.
Reed Copsey
A: 

It's difficult to answer your question, as what you've posted won't actually compile (Select produces an IEnumerable<T>, but you're assigning it to a List<T>. Assuming the Filter(data) function returns a filteredData, you'd have to call ToList() on the query to store it in results).

requests is, I assume, already populated with data. This list will follow normal garbage collection rules. I'm assuming what you're worried about is the result of the Process function. I can't say specifically what will happen, because I also have no idea what your Filter function does. Unless the result of the Filter function holds on to a reference to its parameter (in other words, the result of the Process function), then the objects created by Process will be fully out of scope upon the completion of the query and will follow normal garbage collection rules.

Bear in mind that these rules govern eligibility for collection. No objects are ever guaranteed to be collected during the lifetime of your application. The results, however, will be eligible, so the GC will be able to collect them.

Adam Robinson
fixed that... wasn't copying from a file, so yeah that was an oversite
tbischel
+2  A: 

The garbage collector is quite aggressive in .NET and can clean up intermediate objects when they are no longer referenced, even inside loops. In fact in some cases it will clean up an object that still is referenced if it can see that it will never be accessed again.

Running this code shows that objects are cleaned up quite quickly and do not hang about until the query completes (which it never does):

public class MyClass1 { ~MyClass1() { Console.WriteLine("Cleaned up MyClass1"); } }
public class MyClass2 { ~MyClass2() { Console.WriteLine("Cleaned up MyClass2"); } }

public class Program
{
    static IEnumerable<MyClass1> lotsOfObjects()
    {
        while (true)
            yield return new MyClass1();
    }

    static void Main()
    {
        var query = lotsOfObjects().Select(x => foo(x));
        foreach (MyClass2 x in query)
            query.ToString();
    }

    static MyClass2 foo(MyClass1 x)
    {
        return new MyClass2();
    }
}

Result:

Cleaned up MyClass1
Cleaned up MyClass1
Cleaned up MyClass1
Cleaned up MyClass2
Cleaned up MyClass2
Cleaned up MyClass1
Cleaned up MyClass2
etc...
Mark Byers