tags:

views:

394

answers:

8

The code below is extremely slow for tables of any significant size. (100, 1000, etc...) The culprit is instantiating my objects with new T(). Note that this isn't my finalized code, I've just broken parts of it out in order to more easily profile. Instantiation and initialization will happen together once I refactor the code back into shape.

Is there any way to speed this up? I'm probably forgetting something really simple, or maybe I'm boned. Hopefully, the former.

public static IList<T> ToList<T>(this DataTable table) where T : Model, new()
{
    T[] entities = new T[table.Rows.Count];

    // THIS LOOP IS VERY VERY SLOW
    for (int i = 0; i < table.Rows.Count; i++)
        entities[i] = new T();

    // THIS LOOP IS FAST
    for (int i = 0; i < table.Rows.Count; i++)
        entities[i].Init(table, table.Rows[i]);

    return new List<T>(entities);
}

edit for more info:

The constructor of any given ModelType will look like this:

public ModelType()
{
    _modelInfo = new ModelTypeInfo();
}

The constructor of any given ModelTypeInfo will simply set some string and string[] values, and that class' only job is to provide the values set.

edit for even more info:

Since it seems to be a hot topic, here is what my method looks like for reals before breaking out object construction and initialization:

public static IList<T> ToList<T>(this DataTable table, ModelInfo modelInfo) where T : Model, new()
{
    var tempRepository = new Repository<T>(modelInfo);

    var list = new List<T>();
    foreach (DataRow row in table.Rows)
        list.Add(tempRepository.FromData(table, row));

    return list;
}
+3  A: 

The title of your question suggests that this has to do with the fact that the method is generic. Is allocating the same number of objects without generics faster? If not, it must be to do with whatever work's going on in your constructor. Can you post the constructor code?

EDITED Here is something I wrote awhile ago to cache constructors in a DynamicMethod, which is very fast:

In your class:

delegate T ConstructorDelegate();

The method body:

DynamicMethod method = new DynamicMethod(string.Empty, typeof(T), null,
    MethodBase.GetCurrentMethod().DeclaringType.Module);
ILGenerator il = method.GetILGenerator();
il.Emit(OpCodes.Newobj, type.GetConstructor(Type.EmptyTypes));
il.Emit(OpCodes.Ret);
var constructor = (ConstructorDelegate)method.CreateDelegate(typeof(ConstructorDelegate));
Ben M
Fixed the title to be less specific.
Stuart Branham
Maybe use Func<T> instead of declaring a named ConstructorDelegate?
Jeffrey Hantin
That would be better. The snippet's a bit old -- written before I really got into using Func and Action with C#3.
Ben M
A: 

Are you testing a release build?
Is tables.loop.count a simple property, and can you compare to hoisting it out of the loop?
What's the cost of instantiating T?
Does creating T allocate lots of small objects, so that you run into a few garbage collections?

peterchen
+2  A: 

Do you really need a list, or would an IEnumerable be good enough? If so, you could do lazy/deferred creation of your objects:

public static IEnumerable<T> ToEnumerable<T>(this DataTable table) where T : Model, new()
{
    foreach (DataRow row in table.Rows)
    {
        T entity = new T();
        entity.Init(table, row);

        yield return entity;
    }
}

Unfortunately this is still likely to be slow, because most of the time is likely spent construction the object, but it might allow you to defer this load long enough to make the app appear faster, or until after you are able to filter some of the objects out entirely.

Also, you might think of implementing this using a Factory -like pattern:

public static IEnumerable<T> ToEnumerable<T>(this DataTable table, Func<DataRow, T> TFactory) 
{
    foreach (DataRow row in table.Rows)
    {
        yield return TFactory(row);
    }
}
Joel Coehoorn
The original code was more factory-like, and there is a method for generating a T based on a DataRow. The yield keyword I didn't know about! Thank you for showing this to me. Unfortunately I do I need an IList.
Stuart Branham
You can just call .ToList() on the results.
Joel Coehoorn
Right, I must have forgotten. Silly me. ;) I did end up implementing your IEnumerable<T> method anyway, and cleaned up ToList() to piggyback on ToEnumerable(). I'll probably need it, anyway.
Stuart Branham
+11  A: 

Under the covers, new T() generates a call to System.Activator.CreateInstance<T>(), which is (reflectively) slow:

L_0012: ldc.i4.0 
L_0013: stloc.1 
L_0014: br.s L_0026
L_0016: ldloc.0 
L_0017: ldloc.1 
L_0018: call !!0 [mscorlib]System.Activator::CreateInstance<!!T>()
L_001d: stelem.any !!T
L_0022: ldloc.1 
L_0023: ldc.i4.1 
L_0024: add 
L_0025: stloc.1

You may wish to consider passing in a construction delegate instead.

Jeffrey Hantin
+1. Didn't know that, and probably answers the question.
Ben M
Or, would it be possible to identify the correct constructor only once, rather than n times? I think that would be good enough without uglying up every call to this method.
Stuart Branham
CreateDelegate requires a MethodInfo which doesn't match a ConstructorInfo. At best, you'd have to Emit a wrapper method for the constructor and wrap that in a delegate to avoid the reflection hit inside the loop.
Jeffrey Hantin
Yes - you can call typeof(T).GetConstructor() once, and ConstructorInfo.Invoke() in your loop. That should speed it up considerably.
Ben M
Heaven help you if you're on Compact Framework -- all the heavy runtime cost is in the final Invoke.
Jeffrey Hantin
See my answer below for an example of wrapping the constructor in a delegate to a DynamicMethod.
Ben M
A call to Activator.CreateInstance is not *always* performed, only for reference types.
kek444
Still, it's ugly. I'm likely never using _where T : new()_ again. Asking for a factory method is just cleaner.
Joel Coehoorn
The call is probably performed for reference types so that only a single "any reference type" implementation is emitted and shared among all reference-typed instantiations of the generic; value types generate unique instantiations, probably because of run-time type erasure.
Jeffrey Hantin
A: 

To show by example, this method in C#:

public T Method<T>() where T : new()
{
    return new T();
}

is compiled to this MSIL code (from Reflector):

.method public hidebysig instance !!T Method<.ctor T>() cil managed
{
.maxstack 2
.locals init (
    [0] !!T CS$1$0000,
    [1] !!T CS$0$0001)
L_0000: nop 
L_0001: ldloca.s CS$0$0001
L_0003: initobj !!T
L_0009: ldloc.1 
L_000a: box !!T
L_000f: brfalse.s L_001c
L_0011: ldloca.s CS$0$0001
L_0013: initobj !!T
L_0019: ldloc.1 
L_001a: br.s L_0021
L_001c: call !!0 [mscorlib]System.Activator::CreateInstance<!!T>()
L_0021: stloc.0 
L_0022: br.s L_0024
L_0024: ldloc.0 
L_0025: ret 
}

To not go into too much internals, there are several steps here, several conditions being checked, the need to initialize data fields, etc, and finally a call to the Activator may be required. All this to instantiate a generic type object. And yes, this is used instead of a direct call to the type's constructor always.

kek444
A: 

Even if you need to use a list, why create the array first?

public static IList<T> ToList<T>(this DataTable table) where T : Model, new()
{
    var list = new List<T>();
    foreach (DataRow dr in table.Rows) {
        T entity = new T();
        entity.Init(table, dr);
        list.Add(entity);
    }
    return list;
}
John Saunders
This is temporary code to illustrate where the slowness occurs. Read the question more closely please.
Stuart Branham
+2  A: 

The problem is that the expression new T() actually uses reflection behind the scenes. (It calls Activator.CreateInstance) Therefore, each call to it will take time.


One solution would be to constrain T to implement ICloneable. Then, you can write new T() once, and clone it in the loop. Obviously, you can only do that if you have full control over the model.


Another option would be to make the method take a creator delegate, like this:

public static IList<T> ToList<T>(this DataTable table, Func<T> creator) where T : Model {
    T[] entities = new T[table.Rows.Count];
    for (int i = 0; i < table.Rows.Count; i++)
        entities[i] = creator();

    //...
}

You would then call it like this:

table.ToList(() => new MyModelType());

Because it's used in a parameter, you wouldn't need to specify the generic type explicitly when calling the method.


The least intrusive method would be to use LINQ expressions to make your own creator methods.

EDIT: Like this:

static class CreatorFactory<T> where T : new() {
    public static readonly Func<T> Method = 
        Expression.Lambda<Func<T>>(Expression.New(typeof(T)).Compile();
}

public static IList<T> ToList<T>(this DataTable table) where T : Model {
    var entities = table.Rows.Select(r => CreatorFactory<T>.Method()).ToList();

    for (int i = 0; i < table.Rows.Count; i++)
        entities[i].Init(table, table.Rows[i]);

    return entities;
}
SLaks
I was just going to suggest this. I use a similar method for constructing treenodes.
leppie
I bet this gets him right down to the next bottleneck (I can bet because I was just about to add pretty much the same code into my answer).
John Saunders
A: 

For anyone running into this issue later on, this blog post was extremely helpful for me: http://blogs.msdn.com/haibo_luo/archive/2005/11/17/494009.aspx

Here are the changes I ended up making to my "factory" method. (not really a proper factory, but serves the purpose)

public class Repository<T> : IRepository<T> where T : Model, new()
{
    // ...

    private delegate T CtorDelegate();
    private CtorDelegate _constructor = null;
    private CtorDelegate Constructor
    {
        get
        {
            if (_constructor == null)
            {
                Type type = typeof(T);
                DynamicMethod dm = new DynamicMethod(type.Name + "Constructor", type, new Type[] { }, typeof(Repository<T>).Module);
                ILGenerator ilgen = dm.GetILGenerator();
                ilgen.Emit(OpCodes.Nop);
                ilgen.Emit(OpCodes.Newobj, type.GetConstructor(Type.EmptyTypes));
                ilgen.Emit(OpCodes.Ret);
                _constructor = (CtorDelegate)dm.CreateDelegate(typeof(CtorDelegate));
            }
            return _constructor;
        }
    }

    public T FromData(DataTable table, DataRow row)
    {
        T model = Constructor(); // was previously = new T();
        model.Init(table, row);
        return model;
    }
}
Stuart Branham