I want to answer 'yes' to your last question, just for fun :-) But I know you would like some arguments, so let me try :
if a concept (here copying) is unique to an object, then you can consider merging the two in the same class (that would be implementing the method in the class itself). For example:
- if for some objects, there are several way to make copies, depending on ..., then the copying really deserves to be out of the original object.
- if for all objects, there is a unique way of copying it, is there really much value in creating a different object to do the copying ? (It has for example if the total complexity is too much, hard to read and understand, so it's good to break it down).
I have always had trouble creating copies by calling an explicit constructor. The reason is that Constructors are the only methods that cannot be inherited (excluding statics...), so they cannot be generic (impossible to have a unique interface for all your copiable objects). This means you can have no generic code, in all your application, that is able to copy your objects. Every time I tried, there comes a time when I really need to make a copy in a general way.
Explicitely calling constructors also means I will be impossible, in the future, to substitute a subclass. Say you have a algorithm A that works on a variable B. If you give A a subclass C of B, when A makes a copy its B variable (who's actual type is C), the copy will be created with the B constructor, so it will not be of the same class, and probably will change behavior. So copying by calling constructor is extrelemely limited.
Explicitely calling constructors means it is impossible to work with interfaces. You can read in so many places about the value of interfaces... So for example, in our application, many objects are instanciated no directly in our code, but a Locator/Factory is asked for an interface (or class), with many possible advantages (if your application comes to need this one day):
- If I want to substitute to every A object a B subclass in a specific context, for example to measure performance of a costly operation during some automated testing, it's very easy. We also needed to substitute HashMaps by a subclass, to find one non-seriable object that was inserted in the Map and later caused errors during Serialization.
- If I have an interface, creating an object only involves the interface in my regular code (Factory excepted). So I have no dependency at all to the concrete class, which is so good as you know (a dependency to an interface has so much less transitive dependencies, and is so much easier to mock for testing).
- this factory is actually Spring-backed in our case, so the instanciation is done via Spring. Many additional steps are taken as need (proxying, interception, initialization methods ...).
In our application, we usually end up creating one (or a few) cloners. Given a top object, they know how to make a deep copy of it. The advantage with a generic cloner is that the code is written only once, it is generic for the whole application. Often, it is also reused between applications...
Implementation: using reflexion for example, you get every member recursively. There are many traps to avoid however:
- loops: A references B that references C that references A. So I keep a Map of the objets that have been copied already, referencing the copy. When I would copy an object, but find out it is already in the map, I don't copy it, but substitute its already-made copy.
- special types: enums should not be copied (also some other static objects). Some library classes could have problems also, so you can keep a Set or Map of special classes that you don't want to copy, or copy in a special way.
- you can get in trouble with final fields ...
Specific cases
There are often specific objects where the default way is not correct. We want both, a generic implementation, and the possibility to overload it as needed. For them, we use this:
- if we can modify the objects, we let them implement a specific CustomizedCopier interface, and their code in that method is responsible for doing the copying, as they want. The generic code doesn't do anything if he sees this interface.
- if we can't modify the objects (JRE, third-party code ...), we have a Map/Registry Map that stores the classes that are specific, along with the specific copier that we want for them. Note that this trick is also used sometimes to customize the copying not in general, but only for some special use-cases, as it can overload the way objects are copied.
In fact, I usually ended up with several cloners. For example, cloning data persistent entities typically use this knowledge to clone a bit differently (for example, ids and audit fields could be made nulls).
I usually also have a class that does the same dependency search, but for other needs:
- toString() a complex object to create a debug String of it.
- equals() and hashCode() implementations if needed.
- reinitialise a graph of objects to its default values for all properties (think of the implementation of a 'reset' button in a multi-tab huge form).
- check for existence of an object somewhere in an object graph
- control the Seriability of a graph of objects (typical use-case with the HttpSession that is serialized in some condition ; in development, we check explicity the objects, to detect an not-serializable object, and provide the best error message to the developper).
- ...
Please note that the copying is needed often for multi-threading. Ideally, objects reused in multi-threaded environments are immutable. If not, cloning is typically advised to ensure program global coherence...
Performance
Using reflexion is not always so fast. Typically, for a copying that is big in volume and used often, we would implement the copying in the objects themselves. But we found out there are only a few classes that need to be copied and are in high volume, so it is just an exception to the general mecanism, that we plug afterwards (I wrote earlier in the post how, using the register) only when they become useful.