views:

1273

answers:

5

What is the best way to deep clone an interconnected set of objects? Example:

class A {
    B theB; // optional
    // ...
}

class B {
    A theA; // optional
    // ...
}

class Container {
    A[] a;
    B[] b;
}

The obvious thing to do is walk the objects and deep clone everything as I come to it. This creates a problem however -- if I clone an A that contains a B, and that B is also in the Container, that B will be cloned twice after I clone the Container.

The next logical step is to create a Dictionary and look up every object before I clone it. This seems like it could be a slow and ungraceful solution, however.

Any thoughts?

+2  A: 

The dictionary solution you suggested is the best I know of. To optimize further, you could use object.GetHashCode() to get a hash for the object, and use that as the dictionary key. Should be fast unless you're talking about huge object trees (10s to 100s of thousands of objects).

Chris Hynes
+2  A: 

Not that I am familiar with C#, but typically any type of crawling of a graph for some sort of processing will require a lookup table to stop processing an object due to cyclic references. So I would think you will need to do the same here.

Robin
+2  A: 

Its not an elegant solution for sure, but it isn't uncommon to use a dictionary (or hashmap). One of the benefits is that a hashmap has a constant lookup time, so speed does not really suffer here.

JoshJordan
A: 

maybe create a bit flag to indicate whether this object has been cloned before.

J.W.
A: 

Another possible solution you could investigate is serializing the objects into a stream, and then reconstructing them from that same stream into new instances. This often works wonders when everything else seems awfully convoluted and messy.

Marc

marc_s
Can you elaborate? How could I prevent serialiizing the same object twice? Or if I just serialize "Container", how could I store *references* to A and B instead of the entire object, to avoid duplication? I can't just use a simple index like you would believe from the simple example.
zildjohn01
WCF supports this with some tweaking (http://blogs.msdn.com/sowmy/archive/2006/03/26/561188.aspx). But serialization, while quick to code, is going to perform much worse than an explicit deep copy.
Pontus Gagge
True - serializing and deserializing again might not be top in terms of performance. This comes down to the eternal "it depends" scenario :-) Do you need top performance and do you do this hundred of thousands of times in a loop --> then serializing might not be the best route to take.
marc_s