views:

2486

answers:

3

Does anyone know if it possible to define the equivalent of a "java custom class loader" in .NET?

To give a little background:

I am in the process of developing a new programing language that targets the CLR, called "Liberty". One of the features of the language is its ability to define "type constructors", which are methods that are executed by the compiler at compile time and generate types as output. They are sort of a generalization of generics (the language does have normal generics in it), and allow code like this to be written (in "Liberty" syntax):

var t as tuple<i as int, j as int, k as int>;
t.i = 2;
t.j = 4;
t.k = 5;

Where "tuple" is defined like so:

public type tuple(params variables as VariableDeclaration[]) as TypeDeclaration
{
   //...
}

In this particular example, the type constructor "tuple" provides something similar to anonymous types in VB and C#.

However, unlike anonymous types, "tuples" have names and can be used inside public method signatures.

This means that I need a way for the type that eventually ends up being emitted by the compiler to be shareable across multiple assemblies. For example, I want

tuple<x as int> defined in Assembly A to end up being the same type as tuple<x as int> defined in Assembly B.

The problem with this, of course, is that Assembly A and Assembly B are going to be compiled at different times, which means they would both end up emitting their own incompatible versions of the tuple type.

I looked into using some sort of "type erasure" to do this, so that I would have a shared library with a bunch of types like this (this is "Liberty" syntax):

class tuple<T>
{
    public Field1 as T;
}

class tuple<T, R>
{
    public Field2 as T;
    public Field2 as R;
}

and then just redirect access from the i, j, and k tuple fields to "Field1", "Field2", and "Field3".

However that is not really a viable option. This would mean that at compile time tuple<x as int> and tuple<y as int> would end up being different types, while at runtime time they would be treated as the same type. That would cause many problems for things like equality and type identity. That is too leaky of an abstraction for my tastes.

Other possible options would be to use "state bag objects". However, using a state bag would defeat the whole purpose of having support for "type constructors" in the language. The idea there is to enable "custom language extensions" to generate new types at compile time that the compiler can do static type checking with.

In Java, this could be done using custom class loaders. Basically the code that uses tuple types could be emitted without actually defining the type on disk. A custom "class loader" could then be defined that would dynamically generate the tuple type at runtime. That would allow static type checking inside the compiler, and would unify the tuple types across compilation boundaries.

Unfortunately, however, the CLR does not provide support for custom class loading. All loading in the CLR is done at the assembly level. It would be possible to define a seperate assembly for each "constructed type", but that would very quickly lead to performance problems (having many assemblies with only one type in them would use too many resources).

So, what I want to know is:

Is it possible to simulate something like Java Class Loaders in .NET, where I can emit a reference to a non-existing type in and then dynamically generate a reference to that type at runtime before the code the needs to use it runs?

NOTE:

*I actually already know the answer to the question, which I provide as an answer below. However, it took me about 3 days of research, and quite a bit of IL hacking in order to come up with a solution. I figured it would be a good idea to document it here in case anyone else ran into the same problem. *

A: 

I think this is the type of thing the DLR is supposed to provide in C# 4.0. Kind of hard to come by information yet, but perhaps we'll learn more at PDC08. Eagerly waiting to see your C# 3 solution though... I'm guessing it uses anonymous types.

Kevin Dostalek
+18  A: 

The answer is yes, but the solution is a little tricky.

The System.Reflection.Emit namespace defines types that allows assemblies to be generated dynamically. They also allow the generated assemblies to be defined incrementally. In other words it is possible to add types to the dynamic assembly, execute the generated code, and then latter add more types to the assembly.

The System.AppDomain class also defines an AssemblyResolve event that fires whenever the framework fails to load an assembly. By adding a handler for that event, it is possible to define a single "runtime" assembly into which all "constructed" types are placed. The code generated by the compiler that uses a constructed type would refer to a type in the runtime assembly. Because the runtime assembly doesn't actually exist on disk, the AssemblyResolve event would be fired the first time the compiled code tried to access a constructed type. The handle for the event would then generate the dynamic assembly and return it to the CLR.

Unfortunately, there are a few tricky points to getting this to work. The first problem is ensuring that the event handler will always be installed before the compiled code is run. With a console application this is easy. The code to hookup the event handler can just be added to the Main method before the other code runs. For class libraries, however, there is no main method. A dll may be loaded as part of an application written in another language, so it's not really possible to assume there is always a main method available to hookup the event handler code.

The second problem is ensuring that the referenced types all get inserted into the dynamic assembly before any code that references them is used. The System.AppDomain class also defines a TypeResolve event that is executed whenever the CLR is unable to resolve a type in a dynamic assembly. It gives the event handler the opportunity to define the type inside the dynamic assembly before the code that uses it runs. However, that event will not work in this case. The CLR will not fire the event for assemblies that are "statically referenced" by other assemblies, even if the referenced assembly is defined dynamically. This means that we need a way to run code before any other code in the compiled assembly runs and have it dynamically inject the types it needs into the runtime assembly if they have not already been defined. Otherwise when the CLR tried to load those types it will notice that the dynamic assembly does not contain the types they need and will throw a type load exception.

Fortunately, the CLR offers a solution to both problems: Module Initializers. A module initializer is the equivalent of a "static class constructor", except that it initializes an entire module, not just a single class. Baiscally, the CLR will:

  1. Run the module constructor before any types inside the module are accessed.
  2. Guarantee that only those types directly accessed by the module constructor will be loaded while it is executing
  3. Not allow code outside the module to access any of it's members until after the constructor has finished.

It does this for all assemblies, including both class libraries and executables, and for EXEs will run the module constructor before executing the Main method.

See this blog post for more information about constructors.

In any case, a complete solution to my problem requires several pieces:

  1. The following class definition, defined inside a "language runtime dll", that is referenced by all assemblies produced by the compiler (this is C# code).

    using System;
    using System.Collections.Generic;
    using System.Reflection;
    using System.Reflection.Emit;
    
    
    namespace SharedLib
    {
        public class Loader
        {
            private Loader(ModuleBuilder dynamicModule)
            {
                m_dynamicModule = dynamicModule;
                m_definedTypes = new HashSet<string>();
            }
    
    
    
        private static readonly Loader m_instance;
        private readonly ModuleBuilder m_dynamicModule;
        private readonly HashSet&lt;string&gt; m_definedTypes;
    
    
        static Loader()
        {
            var name = new AssemblyName("$Runtime");
            var assemblyBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly(name, AssemblyBuilderAccess.Run);
            var module = assemblyBuilder.DefineDynamicModule("$Runtime");
            m_instance = new Loader(module);
            AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(CurrentDomain_AssemblyResolve);
        }
    
    
        static Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args)
        {
            if (args.Name == Instance.m_dynamicModule.Assembly.FullName)
            {
                return Instance.m_dynamicModule.Assembly;
            }
            else
            {
                return null;
            }
        }
    
    
        public static Loader Instance
        {
            get
            {
                return m_instance;
            }
        }
    
    
        public bool IsDefined(string name)
        {
            return m_definedTypes.Contains(name);
        }
    
    
        public TypeBuilder DefineType(string name)
        {
            //in a real system we would not expose the type builder.
            //instead a AST for the type would be passed in, and we would just create it.
            var type = m_dynamicModule.DefineType(name, TypeAttributes.Public);
            m_definedTypes.Add(name);
            return type;
        }
    }
    
    }

    The class defines a singleton that holds a reference to the dynamic assembly that the constructed types will be created in. It also holds a "hash set" that stores the set of types that have already been dynamically generated, and finally defines a member that can be used to define the type. This example just returns a System.Reflection.Emit.TypeBuilder instance that can then be used to define the class being generated. In a real system, the method would probably take in an AST representation of the class, and just do the generation it's self.

  2. Compiled assemblies that emit the following two references (shown in ILASM syntax):

    .assembly extern $Runtime
    {
        .ver 0:0:0:0
    }
    .assembly extern SharedLib
    {
        .ver 1:0:0:0
    }
    

    Here "SharedLib" is the Language's predefined runtime library that includes the "Loader" class defined above and "$Runtime" is the dynamic runtime assembly that the consructed types will be inserted into.

  3. A "module constructor" inside every assembly compiled in the language.

    As far as I know, there are no .NET languages that allow Module Constructors to be defined in source. The C++ /CLI compiler is the only compiler I know of that generates them. In IL, they look like this, defined directly in the module and not inside any type definitions:

    .method privatescope specialname rtspecialname static 
            void  .cctor() cil managed
    {
        //generate any constructed types dynamically here...
    }
    

    For me, It's not a problem that I have to write custom IL to get this to work. I'm writing a compiler, so code generation is not an issue.

    In the case of an assembly that used the types tuple<i as int, j as int> and tuple<x as double, y as double, z as double> the module constructor would need to generate types like the following (here in C# syntax):

    class Tuple_i_j<T, R>
    {
        public T i;
        public R j;
    }
    
    
    class Tuple_x_y_z<T, R, S>
    {
        public T x;
        public R y;
        public S z;
    }
    

    The tuple classes are generated as generic types to get around accessibility issues. That would allow code in the compiled assembly to use tuple<x as Foo>, where Foo was some non-public type.

    The body of the module constructor that did this (here only showing one type, and written in C# syntax) would look like this:

    var loader = SharedLib.Loader.Instance;
    lock (loader)
    {
        if (! loader.IsDefined("$Tuple_i_j"))
        {
            //create the type.
            var Tuple_i_j = loader.DefineType("$Tuple_i_j");
            //define the generic parameters <T,R>
           var genericParams = Tuple_i_j.DefineGenericParameters("T", "R");
           var T = genericParams[0];
           var R = genericParams[1];
           //define the field i
           var fieldX = Tuple_i_j.DefineField("i", T, FieldAttributes.Public);
           //define the field j
           var fieldY = Tuple_i_j.DefineField("j", R, FieldAttributes.Public);
           //create the default constructor.
           var constructor= Tuple_i_j.DefineDefaultConstructor(MethodAttributes.Public);
    
    
    
       //"close" the type so that it can be used by executing code.
       Tuple_i_j.CreateType();
    }
    
    }

So in any case, this was the mechanism I was able to come up with to enable the rough equivalent of custom class loaders in the CLR.

Does anyone know of an easier way to do this?

Scott Wisniewski
+2  A: 

Hi, Scott,

I'm interested in having a look at your "Liberty" language. Is it publically available?

Thanks, Markus

MarkusSchaber
Unfortunately, no.I stopped working on it so that I could focus on Code Agent, my company's first product. When I started Transactor (long before it was called Transactor) I initially wanted to design my own programing language. After a few months, though, I realized it was going to take about 3 mans years before I could get something I could charge money for (I needed an IDE to charge money for). I didn't have enough money to do that, so I had to focus on something more realistic.
Scott Wisniewski