views:

91

answers:

2

I was playing a bit with Eric Lippert's Ref<T> class from here. I noticed in the IL that it looked like both anonymous methods were using the same generated class, even though that meant the class had an extra variable.

While using only one new class definition seems somewhat reasonable, it strikes me as very odd that only one instance of <>c__DisplayClass2 is created. This seems to imply that both instances of Ref<T> are referencing the same <>c__DisplayClass2 Doesn't that mean that y cannot be collected until vart1 is collected, which may happen much later than after joik returns? After all, there is no guarantee that some idiot won't write a function (directly in IL) which directly accesses y through vart1 aftrer joik returns. Maybe this could even be done with reflection instead of via crazy IL.

sealed class Ref<T>
{
    public delegate T Func<T>();
    private readonly Func<T> getter;
    public Ref(Func<T> getter)
    {
        this.getter = getter;
    }
    public T Value { get { return getter(); } }
}

static Ref<int> joik()
{
    int[] y = new int[50000];
    int x = 5;
    Ref<int> vart1 = new Ref<int>(delegate() { return x; });
    Ref<int[]> vart2 = new Ref<int[]>(delegate() { return y; });
    return vart1;
}

Running IL DASM confirmed that vart1 and vart2 both used <>__DisplayClass2, which contained a public field for x and for y. The IL of joik:

.method private hidebysig static class Program/Ref`1<int32> 
        joik() cil managed
{
  // Code size       72 (0x48)
  .maxstack  3
  .locals init ([0] class Program/Ref`1<int32> vart1,
           [1] class Program/Ref`1<int32[]> vart2,
           [2] class Program/'<>c__DisplayClass2' '<>8__locals3',
           [3] class Program/Ref`1<int32> CS$1$0000)
  IL_0000:  newobj     instance void Program/'<>c__DisplayClass2'::.ctor()
  IL_0005:  stloc.2
  IL_0006:  nop
  IL_0007:  ldloc.2
  IL_0008:  ldc.i4     0xc350
  IL_000d:  newarr     [mscorlib]System.Int32
  IL_0012:  stfld      int32[] Program/'<>c__DisplayClass2'::y
  IL_0017:  ldloc.2
  IL_0018:  ldc.i4.5
  IL_0019:  stfld      int32 Program/'<>c__DisplayClass2'::x
  IL_001e:  ldloc.2
  IL_001f:  ldftn      instance int32 Program/'<>c__DisplayClass2'::'<joik>b__0'()
  IL_0025:  newobj     instance void class Program/Ref`1/Func`1<int32,int32>::.ctor(object,
                                                                                    native int)
  IL_002a:  newobj     instance void class Program/Ref`1<int32>::.ctor(class Program/Ref`1/Func`1<!0,!0>)
  IL_002f:  stloc.0
  IL_0030:  ldloc.2
  IL_0031:  ldftn      instance int32[] Program/'<>c__DisplayClass2'::'<joik>b__1'()
  IL_0037:  newobj     instance void class Program/Ref`1/Func`1<int32[],int32[]>::.ctor(object,
                                                                                        native int)
  IL_003c:  newobj     instance void class Program/Ref`1<int32[]>::.ctor(class Program/Ref`1/Func`1<!0,!0>)
  IL_0041:  stloc.1
  IL_0042:  ldloc.0
  IL_0043:  stloc.3
  IL_0044:  br.s       IL_0046
  IL_0046:  ldloc.3
  IL_0047:  ret
} // end of method Program::joik
+8  A: 

Yes, the MS implementation of anonymous methods effectively creates one hidden class per level of scope that it needs to capture variables from, and captures all the relevant variables from that scope. I believe this is done for the sake of simplicity, but it can indeed increase the lifetime of some objects unnecessarily.

It would be more elegant for each anonymous method to only capture the variables it was actually interested in. However, this could make life considerably more complicated... if one anonymous method captured x and y, one captured x and one captured y, you'd need three classes: one for capturing x, one for capturing y, and one for composing the two (but not just having two variables). The tricky bit is that for any single variable instantiation, that variable needs to live in exactly one place so that everything which refers to it sees the same value, whatever changes it.

This doesn't violate the spec in any way, but it could be considered unfortunate - I don't know whether it's actually bitten people in real life, but it's certainly possible.

The good news is that if the C# team decide to improve this, they should be able to do so in an entirely backwardly compatible way, unless some muppets are relying on lifetimes being extended unnecessarily.

Jon Skeet
+1 for using "muppets" as a derogative
arootbeer
"The tricky bit" is an excellent justification for this behavior which I did not consider. I think this could be solved by wrapping every valuetype in a reference (e.g. *always* creating a separate class for each variable and composing them as needed), but that layer of indirection probably has performance penalty which will impact users far more frequently than the lifetime issue. And informed users can easily enough change their code to prevent the lifetime issue from happening, if they discover it is causing an problem.
Brian
@Brian: Not just value types... if the *value* of the variable changes (e.g. a string variable changes from referring to "A" to referring to "B") then that has to be propagated too. Basically each variable instantiation has to have a single storage location. Icky.
Jon Skeet
Deliberately relying on lifetime extension would be hard to justify, but I could see someone accidentally doing something that only worked because of it, and then when the improvement comes... (also, @arootbeer, is "muppet" an unusual derogative elsewhere, it's one of the regular go-tos where I am)
Jon Hanna
@Jon Hanna: I'm reminded of http://blogs.msdn.com/b/oldnewthing/archive/2010/08/09/10047586.aspx , though the latter talks about the opposite problem.
Brian
@Jon Hanna - it's not something I'm used to hearing. I wouldn't mind hearing it more, though :P
arootbeer
@arootbeer, you should come to Ireland for a while then, it's popular here, though less so than in the last decade.
Jon Hanna
@Jon Hanna - it's very high on my list of places to go. Probably around 2018 :)
arootbeer
+7  A: 

Jon is of course right. The problem that this usually causes is:

void M()
{
    Expensive e = GetExpensive();
    Cheap c = GetCheap();
    D longLife = ()=>...c...;
    D shortLife = ()=>...e...;
    ...
}

So we have an expensive resource whose lifetime now depends on the lifetime of longLife, even though shortLife is collected early.

This is unfortunate, but common. The implementations of closures in JScript and VB have the same problem.

I'd like to solve it in a hypothetical future version of C# but I make no guarantees. The obvious way to do it is to identify equivalence classes of closed-over variables based on which lambdas they are captured by, and generate closure classes one per equivalence class, rather than a single closure class.

There also might be things we could do with analysis of what closed-over variables are written to. As Jon notes, we are restricted at present by our need to capture variables rather than values. We could be more flexible in our code generation strategy if we identified variables that are never written to after the closure is created, and make those into closed-over values rather than closed-over variables.

Eric Lippert