views:

314

answers:

5

A few days ago, I answered an interesting question on SO about HashSet<T>. A possible solution involved cloning the hashset, and in my answer I suggested to do something like this:

HashSet<int> original = ...
HashSet<int> clone = new HashSet<int>(original);

Although this approach is quite straightforward, I suspect it's very inefficient: the constructor of the new HashSet<T> needs to separately add each item from the original hashset, and check if it isn't already present. This is clearly a waste of time: since the source collection is a ISet<T>, it is guaranteed not to contain duplicates. There should be a way to take advantage of that knowledge...

Ideally, HashSet<T> should implement ICloneable, but unfortunately it's not the case. I also checked with Reflector to see if the HashSet<T> constructor did something specific if the source collection was a hashset, but it doesn't. It could probably be done by using reflection on private fields, but that would be an ugly hack...

So, did someone come up with a clever solution to clone a hashset more efficiently ?

(Note that this question is purely theoretical, I don't need to do that in a real program)

A: 

Easy pattern which should won't work for many collections:

Class cloneableDictionary(Of T, U)
    Inherits Dictionary(Of T, U)
    Function clone() As Dictionary(Of T, U)
        Return CType(Me.MemberwiseClone, cloneableDict(Of T, U))
    End Function
End Class

Unfortunately, I don't know that Microsoft did anything to prevent calling MemberwiseClone in places where it shouldn't be called (e.g. declaring something other than a method--like maybe a class--with the name MemberwiseClone) so I don't know how one can tell whether such an approach is likely to work.

I think there's a fair reason for a standard collection not to support a public cloning method but only a protected one: it's possible that a class which derives from a collection might break severely if cloned, and if base class' cloning method is public there's no way to prevent an object of a derived class from being given to code that expects to clone it.

That having been said, it would have been nice if .net included cloneableDictionary and other such classes as standard types (though obviously not implemented essentially as above).

supercat
This won't work... It does a shallow copy, which is what I (kind of) want, but it's *too* shallow: most collections internally use arrays to store the items and/or buckets, and MemberwiseClone will create a copy of the collection *with the same array instance*. So the clones won't be independant copies: if I modify one of the collection, the other one will be affected too, and will become corrupted, which is worse !
Thomas Levesque
Here's an example that illustrates the problem: http://pastebin.com/70cTdr6a
Thomas Levesque
Note edits above. Probably worth keeping as an answer, to dissuade anyone else who might come up with the same "solution". BTW, it's too bad Microsoft didn't make "BaseClone" a protected method whose default implementation would be a memberwise clone, and define a standard means for disabling it (e.g. shadowing it with something else called BaseClone that isn't a method).
supercat
@Thomas Levesque: Really an embarrassing mistake, especially since I was just trying to figure out the right pattern for clonable objects. As soon as I saw your first post, I knew immediately that I'd oopsed. It seems a lot of people seem to prefer the notion of a copy constructor, but in general a copy constructor is a poor substitute for a clone method, since the type of the object created by a copy constructor will be the type of the constructor, rather than the type of the object being copied. Maybe I'll post my proposed cloning pattern to a blog and link to it.
supercat
@Thomas Levesque: How do you like the cloning pattern at http://supercatnet.blogspot.com/2010/10/thoughts-about-object-cloning.html ? The method for sealing methods descendant classes shouldn't call seems a little icky, but workable; is there a better method? Is there any way for a derived class to mess things up without using Reflection? Should I post that pattern as an "is this a good pattern" question?
supercat