tags:

views:

214

answers:

6

Does it make sense to implement a copy method on an immutable type, returning a new instance? Or should it just be the current instance?

I thought the type doesn't change anyway so why copy? Like no one copies the number 5, right?

A: 

One of the advantages of immutable types is that they can be interned (e.g., Java strings). Certainly you shouldn't make extra copies if you can avoid it.

Dave
+4  A: 

Technically, an integer is a value type so it is copied constantly. :)

That said, making a copy of an immutable object doesn't make sense. The examples of strings provided by others seems to be a band aid on abstraction leakage by those classes.

Colin Burnett
+2  A: 

Well Java's String class has this:

String(String original)
      Initializes a newly created String object so that it represents the same 
      sequence of characters as the argument; in other words, the newly created 
      string is a copy of the argument string.

and .Net's has the Copy() method which does the same.

Both frameworks were designed by people who are smarter than I am, so there must be a good reason - someone at sometime needs strings that are different reference-wise, but have the same value.

I'm just not sure when that would be...

Michael Burr
Ironically, you wrote this answer just as I was writing an answer explaining the Java String case. (The .NET one is slightly subtler, but has the same underlying reason. You just can't use Substring to explain it.)
Jon Skeet
+9  A: 

There are certain cases where it makes sense. Java strings are a good example. When a string is created in Java, it has a reference to a backing array of characters (a char[]). It knows the offset into the char array, and the length of the string. When you create a substring, that refers to the same backing array. Now consider this code:

String x = buildVeryLongString();
String y = x.substring(0, 5);
// Use y a lot, but x is garbage collected

The fact that y is still in the system means that the original char[] used by x is still required. In other words, you're using more memory than you have to. If you change the code to:

String x = buildVeryLongString();
String y = new String(x.substring(0, 5));

then you'll end up copying the data to a new char[]. When x and y have rougly the same lifetimes this approach wastes memory (by having two copies) but in the case where x is garbage collected before y, it can make a big difference.

I've run into a similar example with strings in real life, when reading words from a dictionary. By default, BufferedReader.readLine() will use a buffer of 80 characters for a line to start with - so any (non-empty) string returned by readLine() will refer to a char[] array of at least 80 characters. If you're reading a dictionary file with one word per line, that's a lot of wasted space!

This is just an example, but it shows the difference between two immutable objects which are semantically equivalent in terms of what you do with them, but have different characteristics in other ways. That is usually at the heart of why you'd want to copy an immutable type - but it's still a pretty rare thing to want to do.

In .NET, strings are stored somewhat differently - the character data is held within the string object itself instead of in a separate array. (Arrays, strings and IntPtr are the only variable-size types in .NET, as far as I'm aware.) However, the "buffer" in the string can still be larger than it needs to be. For example:

StringBuilder builder = new StringBuilder(10000);
builder.Append("not a lot");
string x = builder.ToString();

The string object referred to by x will have a huge buffer. Changing the last line to builder.ToString().Copy() would make the large buffer eligible for garbage collection immediately, leaving a small string instead. Again, doing this unconditionally is a bad idea, but it can be helpful in some cases.

Jon Skeet
This example of strings seems to be a prime example of abstraction leakage and a copy method is a band-aid to the leakage.
Colin Burnett
Jon - do you know if this is a reason why they would put a copy method into the interface for strings or would the copy capability be there anyway, and this is just one use that happened along for it?
Michael Burr
@Michael: Not sure, to be honest. @Colin: Yes, it's a leaky abstraction - but it's *very* rarely relevant. I view abstraction leakage as a necessary evil in many ways. I've certainly rarely found an abstraction which doesn't leak *anywhere* :)
Jon Skeet
Thanks Jon. So all in all it only makes sense if one instance is referencing the other one in some way (like its internals, etc)?
Joan Venge
@Joan - that type of optimization is precisely one of the advantages an immutable design buys you.
Michael Burr
Thanks Michael.
Joan Venge
great answer!!!!!!
Paul Hollingsworth
+2  A: 

Does it make sense to provide a 'Copy' operation on an immutable object?

No.

(There is lots of other interesting discussion in the other answers, but I thought I'd provide the short answer.)

If said object needs to implement an interface that requires a Clone() method (or moral equivalent), it is fine to 'return this'.

Brian
Thanks Brian. It doesn't matter if the type is value type or a reference type if you were to use "return this;"?
Joan Venge
@Joan - with a value-type you are immediately cloning it when you "return this"... with a reference-type you just give them the same reference - but the effect (and answer) is the same.
Marc Gravell
Thanks Marc. I assumed the same, just wanted to be sure.
Joan Venge
Brian, could this kind of abstraction leakage occur with any of the FSharp.Collections built in members?
gradbot
@gradbot, in at least one sense, the answer is 'yes of course'; consider Seq.filter, or equivalently IEnumerable.Where, in a case like "result = someHugeArrayOfInt.Where(x => IsOddPerfectNumber(x))". 'result' is going to leak the huge array it closes over, even though it's "just" an IEnumerable<int> with zero elements.
Brian
+3  A: 

I'll assume we mean objects (classes), since it is a moot point for structs.

There are a few dubious reasons for cloning an immutable object:

  • if the object is remoted, and you want a local copy (although in this case you presumably couldn't use an instance method on the object itself, as that would also return a remoted instance - you'd have to have the clone method local (non-remoted))
  • if you are hugely worried about reflection (even readonly fields can be changed by reflection) - perhaps for some super-security conscious code
  • if some external API (that you can't control) uses reference equality, and you want to use the same "value" as two separate keys - OK, I'm stretching things now...

If we extend the discussion to consider deep cloning, then it becomes more reasonable, since a regular immutable object doesn't imply that any associated objects are also immutable. A deep clone would fix this, but is a separate consideration.

I think maybe the remoting scenario is the best I can do...

Marc Gravell
Thanks Marc. By deep cloning you mean deep copy? If so, actually that was what I meant. With separate copies of the exact same thing, you don't get any advantage other than what you listed, right?
Joan Venge
No - by deep clone/copy, I mean that not only is *that* object copied, but so are any related objects - i.e. a typical Order / OrderLines setup; a shallow copy of an Order has the same collection of (the same) order lines - edit any of the lines, or add/remove lines, and it shows up in both Order objects. A *deep* copy creates a *new* collection of order lines, and each order line is *itself* cloned, all the way down. Then the copies are completely unrelated. This is (relatively) expensive though.
Marc Gravell
Thanks Marc. SO in that sense, does it make sense to do it for immutable types like Point3, Ray, Quats, etc?
Joan Venge
Well, a: they don't sound like they would have references to other objects, so there is no real meaning of "deep"; b: they sound like they are probably structs, so they'll already copy themselves if you blink at them... If they are non-remoted (local), shallow (no reference fields), immutable classes, then there isn't really any benefit cloning them. And structs will take care of themselves.
Marc Gravell
Thanks Marc. That's what I thought. Also can you please tell me or send me a link about remoting? I am not sure what it is in .net.
Joan Venge
MSDN: http://msdn.microsoft.com/en-us/library/kwdt6w2k(VS.85).aspx - but note (as per the page) that this is largely deprecated now... the runtime pretends you have an object locally, and every time you call methods (including property access), it intercepts the call, does an RPC call to where-ever the *actual* object is (often a different machine), and brings the result back.
Marc Gravell
Thanks Marc. In this case as the article suggests, if you use WCF instead, would your #1 point still apply for cloning an immutable object? I wonder if remoting (deprecated) and WCF are the same thing, but different technologies?
Joan Venge
No - they are very different things. WCF objects are local to each end, with the data serialized during transit. Old-style "remoting" objects **simply don't exist** at one end - they only pretend to. The #1 point above does not apply to WCF.
Marc Gravell
Thanks Marc for your answer.
Joan Venge