views:

304

answers:

2

Let's examine the MSIL code generated for the following generic method:

public static U BoxValue<T, U>(T value)
  where T : struct, U
  where U : class
{
  return value;
}

Look:

.method public hidebysig static !!U  BoxValue<valuetype .ctor
 ([mscorlib]System.ValueType, !!U) T,class U>(!!T 'value') cil managed
{
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  box        !!T
  IL_0006:  unbox.any  !!U
  IL_000b:  ret
}

But for generic code above, the more efficient IL representation should be:

  IL_0000:  ldarg.0
  IL_0001:  box        !!T
  IL_0006:  ret

It is known from the constraints that the value is boxed into reference type. Unbox.any opcode is completely redundant because after box opcode the value at IL stack will already be a valid reference to !!U, that can be used without any unboxing.

Why does C# 3.0 compiler doesn't use constraints metadata to emit more efficient generic code? Unbox.any gives a small overhead (just 4x-5x slower), but why not emit better code in this scenario?

+3  A: 

These constraints look strange:

where T : struct, U
where U : class

T is a value type but in the same time must inherit from U which is a reference type. I wonder what types could satisfy the above constraints and allow us to call this method.

Darin Dimitrov
All interface types, `System.Object`, `System.ValueType` (and `System.Enum` type, but C# doesn't supports enumeration constraints)
ControlFlow
This constrains looks strange, but they are absolutely says that *type T has an implicit conversion to type U* and this conversion is *boxing conversion*.
ControlFlow
+4  A: 

It looks like the compiler does this because of some issues with the verifier.

The IL that you would like the compiler to generate is not verifiable, and so the C# compiler can't generate it (all C# code outside of "unsafe" contexts should be verifiable).

The rules for "verification type compatibility" are given in Section 1.8.1.2.3, Partion III of the Ecma spec.

They say that a type 'S' is verification compatible with a type 'T' or (S := T) using the following rules:

  1. [:= is reflexive] For all verification types S, S := S
  2. [:= is transitive] For all verification types S, T, and U if S := T and T := U, then S := U.
  3. S := T if S is the base class of T or an interface implemented by T and T is not a value type.
  4. object := T if T is an interface type.
  5. S := T if S and T are both interfaces and the implementation of T requires the implementation of S
  6. S := null if S is an object type or an interface
  7. S[] := T[] if S := T and the arrays are either both vectors (zero-based, rank one) or neither is a vector and both have the same rank. (This rule deals with array covariance.)
  8. If S and T are method pointers, then S := T if the signatures (return types, parameter types and calling convention) are the same.

Of these rules, the only one that might be applicable in this case is #3.

However, #3 does not apply to your code, because 'U' is not a base class of 'T', and it is not a base interface of 'T', so the 'or' check returns false.

This means that SOME instruction needs to be executed in order to convert a boxed T into a U in a way that will pass the verifier.

I would agree with you that the verification rules should be changed, so that generating the code you want is actually verifiable.

Technically, however, the compiler is doing the "correct" thing based on the ECMA spec.

You should file a bug with somebody at Microsoft.

Scott Wisniewski