views:

722

answers:

5

I'm working on a C# object copy constructor, part of which involves copying the contents of a KeyedCollection into a new KeyedCollection. This is what I have implemented currently:

class MyKeyedCollection : KeyedCollection<uint, DataObject>
{
    protected override uint GetKeyForItem( DataObject do )
    {
        return do.Key;
    }
}

class MyObject
{
    private MyKeyedCollection kc;

    // Copy constructor
    public MyObject( MyObject that )
    {
        this.kc = new MyKeyedCollection();
        foreach ( DataObject do in that.kc )
        {
            this.kc.Add( do );
        }
    }
}

This does the correct thing -- the collection is copied as expected. The problem is that it's also a bit slow. I'm guessing that the problem is that each .Add(do) requires a uniqueness check on the existing data, even though I know it's coming from a source that guarantees uniqueness.

How can I make this copy constructor as fast as possible?

A: 

You could try to serialize the object and then deserialize it into a new object - I cannot tell if this will gain any performance but it might.

Daniel Brückner
+2  A: 

I just ran a test adding 10,000,000 items and to various collections, and the KeyedCollection took about 7x as long as a list, but only about 50% longer than a Dictionary object. Considering that the KeyedCollection is a combination of these two, the performance of Add is perfectly reasonable, and the duplicate-key check it runs is clearly not taking that much time. You might want to run a similar test on your KeyedCollection, and if it's going significantly slower, you can start looking elsewhere (check your MyObject.Key getter to make sure you're not getting overhead from that).


Old Response

Have you tried:

this.kc = that.kc.MemberwiseClone() as MyKeyedCollection;

More info on MemberwiseClone here.

StriplingWarrior
MemberwiseClone() creates a shallow copy, but the KeyCollection will likely contain an array to hold the items. So MemberwiseClone() will just copy the reference to this array, but not the array itself. But, of course, you can use MemberwiseClone() to duplicate the array and all the other required objects, too.
Daniel Brückner
That's a good point. I also just realized that MemberwiseClone is a protected method.
StriplingWarrior
I wrote the memberwiseclone that copies exactly what he wants copied...check my post down below.
johnnycrash
A: 

If you are doing this a lot it suggests you should instead be using immutable collections.

These are structures which you do not modify directly but instead 'modifications' return a new object which may use the old objects state but reflect the changes you made.

Various immutable dictionaries/sets/tree based maps are available for .Net (many in f# however as it is more amenable to this style of development)

Eric Lippert has some excellent articles on this and the AVL Tree should be close to exactly what you want.

ShuggyCoUk
That's an interesting approach, but my understanding is that the main reason to use immutable objects is that they are much easier to deal with conceptually. That is, you don't need to remember (or look for) every bit of code that modifies the object to check for side effects and the like. My problem isn't the complexity but the copy speed. Immutable objects seem like they would do *more* copies than the mutable kind. I think immutable objects solve a lot of programming problems, but I don't think they solve *this* problem.
Whatsit
It reduces the copying since the shared state is *shared* rather than copied. The idea is to not copy unless you really have to. Do you really need to copy...
ShuggyCoUk
+2  A: 

Ok, how about a solution with a little unsafe code? Just for fun?

WARNINGS! This is coded for windows OS and 32 bit, but there is no reason this technique can't be modified to work for 64 bit or other OS's. Finally, I tested this on 3.5 framework. I think it will work on 2.0 and 3.0 but I didn't test. If Redmond changes the number, type, or order of instance variables between the revisions or patches, then this won't work.

But this is fast!!!

This hacks into the KeyedCollection, its underlying List<> and Dictionary<> and copies all the internal data and properties. Its a hack because to do this you have to access private internal variables. I basicly made some structures for KeyedCollection, List, and Dictionary that are those classes' private variables in the right order. I simply point an these structures to where the classes are and voila...you can mess with the private variables!! I used the RedGate reflector to see what all the code was doing so I could figure out what to copy. Then its just a matter of copying some value types and using Array.Copy in a couple places.

The result is CopyKeyedCollection<,>, CopyDict<> and CopyList<>. You get a function that can quick copy a Dictionary<> and one that can quick copy a List<> for free!

One thing I noticed when working this all out was that KeyedCollection contains a list and a dictionary all pointing to the same data! I thought this was wasteful at first, but commentors pointed out KeyedCollection is expressly for the case where you need an ordered list and a dictionary at the same time.

Anyway, i'm an assembly/c programmer who was forced to use vb for awhile, so I am not afraid of doing hacks like this. I'm new to C#, so tell me if I have violated any rules or if you think this is cool.

By the way, I researched the garbage collection, and this should work just fine with the GC. I think it would be prudent if I added a little code to fix some memory for for the ms we spend copying. You guys tell me. I'll add some comments if anyone requests em.

using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;
using System.Collections.ObjectModel;
using System.Reflection;

namespace CopyCollection {

  class CFoo {
    public int Key;
    public string Name;
  }

  class MyKeyedCollection : KeyedCollection<int, CFoo> {
    public MyKeyedCollection() : base(null, 10) { }
    protected override int GetKeyForItem(CFoo foo) {
      return foo.Key;
    }
  }

  class MyObject {
    public MyKeyedCollection kc;

    // Copy constructor
    public MyObject(MyObject that) {
      this.kc = new MyKeyedCollection();
      if (that != null) {
        CollectionTools.CopyKeyedCollection<int, CFoo>(that.kc, this.kc);
      }
    }
  }

  class Program {

    static void Main(string[] args) {

      MyObject mobj1 = new MyObject(null);
      for (int i = 0; i < 7; ++i)
        mobj1.kc.Add(new CFoo() { Key = i, Name = i.ToString() });
      // Copy mobj1
      MyObject mobj2 = new MyObject(mobj1);
      // add a bunch more items to mobj2
      for (int i = 8; i < 712324; ++i)
        mobj2.kc.Add(new CFoo() { Key = i, Name = i.ToString() });
      // copy mobj2
      MyObject mobj3 = new MyObject(mobj2);
      // put a breakpoint after here, and look at mobj's and see that it worked!
      // you can delete stuff out of mobj1 or mobj2 and see the items still in mobj3,
    }
  }

  public static class CollectionTools {

    public unsafe static KeyedCollection<TKey, TValue> CopyKeyedCollection<TKey, TValue>(
     KeyedCollection<TKey, TValue> src, 
     KeyedCollection<TKey, TValue> dst) {

      object osrc = src;
      // pointer to a structure that is a template for the instance variables 
      // of KeyedCollection<TKey, TValue>
      TKeyedCollection* psrc = (TKeyedCollection*)(*((int*)&psrc + 1));  
      object odst = dst;
      TKeyedCollection* pdst = (TKeyedCollection*)(*((int*)&pdst + 1));
      object srcObj = null;
      object dstObj = null;
      int* i = (int*)&i;  // helps me find the stack

      i[2] = (int)psrc->_01_items;
      dstObj = CopyList<TValue>(srcObj as List<TValue>);
      pdst->_01_items = (uint)i[1];

      // there is no dictionary if the # items < threshold
      if (psrc->_04_dict != 0) {
        i[2] = (int)psrc->_04_dict;
        dstObj = CopyDict<TKey, TValue>(srcObj as Dictionary<TKey, TValue>);
        pdst->_04_dict = (uint)i[1];
      }

      pdst->_03_comparer = psrc->_03_comparer;
      pdst->_05_keyCount = psrc->_05_keyCount;
      pdst->_06_threshold = psrc->_06_threshold;
      return dst;
    }

    public unsafe static List<TValue> CopyList<TValue>(
     List<TValue> src) {

      object osrc = src;
      // pointer to a structure that is a template for 
      // the instance variables of List<>
      TList* psrc = (TList*)(*((int*)&psrc + 1));  
      object srcArray = null;
      object dstArray = null;
      int* i = (int*)&i;  // helps me find things on stack

      i[2] = (int)psrc->_01_items;
      int capacity = (srcArray as Array).Length;
      List<TValue> dst = new List<TValue>(capacity);
      TList* pdst = (TList*)(*((int*)&pdst + 1));
      i[1] = (int)pdst->_01_items;
      Array.Copy(srcArray as Array, dstArray as Array, capacity);

      pdst->_03_size = psrc->_03_size;

      return dst;
    }

    public unsafe static Dictionary<TKey, TValue> CopyDict<TKey, TValue>(
     Dictionary<TKey, TValue> src) {

      object osrc = src;
      // pointer to a structure that is a template for the instance 
      // variables of Dictionary<TKey, TValue>
      TDictionary* psrc = (TDictionary*)(*((int*)&psrc + 1)); 
      object srcArray = null;
      object dstArray = null;
      int* i = (int*)&i;  // helps me find the stack

      i[2] = (int)psrc->_01_buckets;
      int capacity = (srcArray as Array).Length;
      Dictionary<TKey, TValue> dst = new Dictionary<TKey, TValue>(capacity);
      TDictionary* pdst = (TDictionary*)(*((int*)&pdst + 1));
      i[1] = (int)pdst->_01_buckets;
      Array.Copy(srcArray as Array, dstArray as Array, capacity);

      i[2] = (int)psrc->_02_entries;
      i[1] = (int)pdst->_02_entries;
      Array.Copy(srcArray as Array, dstArray as Array, capacity);

      pdst->_03_comparer = psrc->_03_comparer;
      pdst->_04_m_siInfo = psrc->_04_m_siInfo;
      pdst->_08_count = psrc->_08_count;
      pdst->_10_freeList = psrc->_10_freeList;
      pdst->_11_freeCount = psrc->_11_freeCount;

      return dst;
    }

    // these are the structs that map to the private variables in the classes
    // i use uint for classes, since they are just pointers
    // statics and constants are not in the instance data.
    // I used the memory dump of visual studio to get these mapped right.
    // everything with a * I copy.  I Used RedGate reflector to look through all
    // the code to decide what needed to be copied.
    struct TKeyedCollection {
      public uint _00_MethodInfo;                  // pointer to cool type info
      // Collection
      public uint _01_items;                       // * IList<T>
      public uint _02_syncRoot;                    //   object
      // KeyedCollection
      public uint _03_comparer;                    //   IEqualityComparer<TKey> 
      public uint _04_dict;                        // * Dictionary<TKey, TItem> 
      public int _05_keyCount;                     // *
      public int _06_threshold;                    // *
      // const int defaultThreshold = 0;
    }

    struct TList {
      public uint _00_MethodInfo;                   //
      public uint _01_items;                        // * T[] 
      public uint _02_syncRoot;                     //   object
      public int _03_size;                          // *
      public int _04_version;                       //
    }

    struct TDictionary {
      // Fields
      public uint _00_MethodInfo;                   //
      public uint _01_buckets;                     // * int[] 
      public uint _02_entries;                     // * Entry<TKey, TValue>[] 
      public uint _03_comparer;                    //   IEqualityComparer<TKey> 
      public uint _04_m_siInfo;                    //   SerializationInfo
      public uint _05__syncRoot;                   //   object 
      public uint _06_keys;                        //   KeyCollection<TKey, TValue> 
      public uint _07_values;                      //   ValueCollection<TKey, TValue> 
      public int _08_count;                        // *
      public int _09_version;
      public int _10_freeList;                     // * 
      public int _11_freeCount;                    // *
    }

  }


}
johnnycrash
Impressive. It's rare to get such complete solutions around here. Looks good at first glance, I'll give it a run through my tests and post the results back here. Regardless, thanks for putting so much effort into this.
Whatsit
Oh, and I'm using the class to store an ordered list with very fast random retrieval, which is what a KeyedCollection is intended to provide.
Whatsit
Way to think outside the box. Most of us .NET/Java folks are stuck in our "safe" little worlds, and don't think that deep.
StriplingWarrior
I will add a comment that this is: fragile (it doesn't even sanity check that the fields are in the right place on a framework change, note that 32/64/mono could all do things differently. Also it isn't pinning the objects that are being mem copied so it will break horribly under any sort of load. Anyone that needs to do this should know that they should do it better.
ShuggyCoUk
@johnnycrash If you are going to be doing stuff like this I suggest you look at some C++/CLI articles regarding interop with unmanaged code as they will focus on the pinning required in some detail
ShuggyCoUk
Oh, I get it an ordered list plus a hash table at the same time. That makes sense. Also, please test this! I am afraid of only two things when using this. 1) I did this for net framework 3.5. It might not work for versions above and below because they could change the layout of the structs. 2) If the GC collects while you are copying using my wacky int[], its possible that it could relocate things. I starting reading a bit about fixing memory or allocating fixed memeory, but thought I should get this posted soon enough to help. I figure someone else can pipe in about fixing the memory.
johnnycrash
Yes fragile (but it shouldnt change unless upgradeing to a new framework...right?). YES!!! This is for 32 bit. I Forgot to add the 32 bit warning. I find it easier to walk thorugh the memory dump in 32 bit. I like the idea of a sanity check for mem layout. And I think the memory layout is good for windows OS only. Thanks for all the comments.
johnnycrash
A: 
SwDevMan81