tags:

views:

463

answers:

13

I've just seen this in the MS Visual Studio docs and the part in bold doesn't make sense to me. Is it wrong or am I not understanding it properly? If you run this, b appears to hold "hello" (as I would expect) and not "h".

Strings are immutable--the contents of a string object cannot be changed after the object is created, although the syntax makes it appear as if you can do this. For example, when you write this code, the compiler actually creates a new string object to hold the new sequence of characters, and the variable b continues to hold "h".

string b = "h";

b += "ello";

+7  A: 

You've done an addition AND an assignment in one step. Strings are immutable, but also a reference type.

string b = "h";
b = b + "ello";

We can look at the pseudo-memory like this:

string b = "h";         // b    := 0x00001000 ["h"]
string tmp1 = "ello";   // tmp1 := 0x00002000 ["ello"]
string tmp2 = b + tmp1; // tmp2 := 0x00003000 ["hello"]
string b = tmp2;        // b    := 0x00003000 ["hello"]

I'm not entirely sure where you're getting that text, because as I read the documentation for the string class I find (not that I think "h" actually gets garbage collected):

Strings are immutable--the contents of a string object cannot be changed after the object is created, although the syntax makes it appear as if you can do this. For example, when you write this code, the compiler actually creates a new string object to hold the new sequence of characters, and that new object is assigned to b. The string "h" is then eligible for garbage collection.

@Jon Skeet brings up that "h" will never be garbage collected due to string interning, and I agree with him, but even moreso the C# Standard agrees with him, otherwise the following from §2.4.4.5 String literals could not be true:

Each string literal does not necessarily result in a new string instance. When two or more string literals that are equivalent according to the string equality operator (§7.9.7) appear in the same program, these string literals refer to the same string instance.

sixlettervariables
Original text from: http://msdn.microsoft.com/en-us/library/362314fe(VS.80).aspxThere's a newer (correct) version here: http://msdn.microsoft.com/en-us/library/362314fe.aspxThanks to the poster who found these.
+1  A: 

There are now three strings. One is the original "h", one is "ello" and the third is "hello". Your b variable points to the "hello" string. The other two strings have no references to them and can be thrown away up by the garbage collector.

Mark Heath
I don't *think* the GC will collect interned strings (specified as literals in the IL, effectively) while the AppDomain lives on. I could be wrong, of course. I just can't imagine why the JIT would create code which would need to recreate "h" and "ello" each time it ran...
Jon Skeet
Sure. I probably shouldn't have mentioned the GC, but the essence of the answer is that the b reference is pointing to a completely different object rather than having changed an existing one.
Mark Heath
Yes - I agree completely on that part :)
Jon Skeet
+4  A: 

Yes, the docs are wrong. (The docs for a number of string methods imply mutability too. They're basically poorly written.)

Heck, even the use of "the compiler" creating the new string object is off. Basically it's doing:

string b = "h";
b = string.Concat(b, "ello");

At that point the compiler's job is done - it's the framework which creates the new string object.

Jon Skeet
Sorry, no, the docs are right, you just did not understand what "immutability" means (just like the OP, but he at least asked)
Sam
Since the framework "creates the new string object" and doesn't modify the original instance, immutability is preserved. The docs are right.
David B
@Sam: I understand mutability, thanks very much. The docs really are wrong. The variable b does *not* "continue to hold h". The docs aren't wrong when they claim strings are immutable, but that doesn't mean they're right in the rest of what they say!
Jon Skeet
@David B: So would you agree with the docs that 'variable b continues to hold "h"'?
Jon Skeet
@Jon Skeet: "h" is some place in memory...I don't believe the framework touches that place. String.Concat will make a NEW place in memory containing "hello", the concat never touches "h".
sixlettervariables
b is a string "reference" the object "h" does not mutate. At the end b now holds a new reference.
OscarRyz
@sixlettervariables: Where did I claim that Concat touched "h"? All I said was that the docs are wrong: the final value of b is a reference to the string "hello", contrary to the docs.
Jon Skeet
@Jon Skeet: gotcha, I'm trying to figure out where he got his text from, because I am missing something here.
sixlettervariables
@sixlettervariables: Here's the MSDN page: http://msdn.microsoft.com/en-us/library/362314fe(VS.80).aspxIronically, here's the new version:http://msdn.microsoft.com/en-us/library/362314fe.aspxThat's better, but it claims "h" can be GC'd, which I don't believe it can due to interning!
Jon Skeet
I myself don't believe it'll be GC'd in actual, but it *could* if the framework at some point didn't intern the strings...
sixlettervariables
String interning is effectively guaranteed by the C# spec. From section 2.4.4.5: "When two or more string literals that are equivalent according to the string equality operator (§7.9.7) appear in the same program, these string literals refer to the same string instance."
Jon Skeet
@Jon Skeet: Take a look at the section marked "Performance Considerations" here: http://msdn.microsoft.com/en-us/library/system.string.intern.aspx That does imply that the actual memory used to create the initial "h" will eventually be eligible for GC.
Scott Dorman
@Scott: I believe the the initial "h" is placed directly into the intern table as it's present in the metadata. However, it looks like string interning is actually a bit more complicated in CLR v2 as it can be "potentially disabled" with an attribute. I suspect the C# spec is then violated...
Jon Skeet
+2  A: 

A string cannot change, but a string variable can be assigned a different value. What you are doing is closer to :

string b = "h";
string temp = b + "ello";
b = temp;

To show the actual immutablity of string, this will fail:

   string b="hello";
   if(b[0] == 'h')  // we can read via indexer
      b[0] = 'H';   // but this will fail.
James Curran
A: 

string b = "h"; b += "ello";

b is just a reference to object in heap. Actually, after the "+=" operation, b doesn't reference to the original "h" any more. Now, it reference to a new string object "hello" which is concatenation of "h" and "ello". The "h" string will be collected by GC.

Morgan Cheng
See my response to Mark Heath about garbage collection.
Jon Skeet
A: 

What's happening is that you're making a new variable that holds 'hello', and then changing b to reference this, the memory for the 'old' b still contains 'h', but that's no longer needed so the garbage collector will clean it up. This is why it's so good to use stringbuilders when iterating and sticking strings together - see this for more info.

Whisk
Actually, it almost certainly *won't* get collected; the sting "h" is a literal from code, which means it is almost certainly interned. As such, it will not get collected. .NET assumes that such strings (i.e. literals from your code) are likely to be used repeatedly.
Marc Gravell
(just reading the other comments, it looks like Jon's already covered this...)
Marc Gravell
+4  A: 

The docs are wrong. The variable b now holds "hello". The string is immutable but the variable can be reassigned.

hwiechers
+4  A: 

The misunderstanding here is about reference types:
String is a reference type, not a value type. This means, your variable b is not an object of type string, it is a reference to an object of type string in memory.
What the doc says, is that the object in memory is immutable.
Still, your reference to the object can be changed to point to some other (immutable) object in memory.
For you it might look like the content of the object has changed, but in the memory it has not, and this is all that immutable thingy is about.

The string itself is immutable. What your example changed was not the string class in memory, but the reference your variable is pointing to.

See this slightly modified code:

string b = "h";
string m1 = b;
b += "ello";
// now b == "hello", m1 == "h"

In the end b will point to "hello", while m1 will point to "h". For you it might seem like "h" has changed to "hello", but it has not. b+="ello" created a new string class containing "hello" and assigned it to b, while the old b still is present in Memory and still contains "b".

If string was not immutable, m1 would contain "hello", too, instead of just "b", because both b and m1 pointed to the same reference.

Sam
You wrote: 'In the end b will point to "hello"' Exactly. That contradicts what the docs say - the docs are wrong.
Jon Skeet
Oh come on, thats nitpicking. The doc says "the string object is immutable". It did not say "the reference pointing to the string object is immutable". The object itself still is unchanged on the heap.
Sam
@Sam: sure, but the docs state that b continues to hold "h". plainly, it doesn't, right?
Ah, so you are on about a logical typo in the docs? Sorry, my misunderstanding in that case - I thought this was about immutability.
Sam
He specifically said he was asking about the bolded portion of the text
Dave Costa
He's "nitpicking" documentation? Was there ever a more appropriate thing to nitpick outside of the legal profession?
Jeffrey L Whitledge
No, as originally stated it was about the part in bold.
(thanks for all the replies, though - first time I posted a question and wow!)
Sorry, in that case I was mistaken! I didn't think it was about the choice of words being wrong in the docs, but about the intent. I apologize!
Sam
A: 

So does the variable b still continue to hold "h" or not?

That's the part that I can't work out.

No, it doesn't. It holds a reference to a newly created (by the framework, not compiler) string "hello".
Jon Skeet
b is not a variable, b is a reference to a memory location.
Sam
re: b being a variable - that's verbatim from the docs
@Sam: b is a variable. The *value* of b is a reference.
Jon Skeet
@Jon, thats funny, I'd say it exactly the other way round: b is a reference - but english is not my first language, so I might be mistaken if its about exact words.
Sam
b is definitely a variable - it's got a name, for a start ("b") which references don't. If b isn't the variable, what is?
Jon Skeet
I understand the part in bold to be suggestive that b doesn't become "ello", that the "h" is used as part of the "+=" operator so that it acts as a short cut instead of saying b = b + "ello" which should be equivalent to b += "ello"
JB King
+4  A: 

People don't seem to be understanding the question. No one is arguing that string objects aren't immutable. The point of contention is what he bolded:

and the variable b continues to hold "h"

I agree with the OP that this portion of the doc is incorrect on two counts:

(1) In the obvious intuitive sense that if you print(b) (or whatever the correct statement is in this language) after his two sample lines you will get "hello" as the result.
(2) In the strict sense that the variable b doesn't hold "h", "hello", or any string value. It holds a reference to a string object.

The contents of the variable b do change as a result of the assignment -- it changes from a point to string object "h" to a pointer to string object "hello".

When they say "hold" what they really mean is "points to". And they are wrong, after the assignment b no longer points to "h".

I think the example they really wanted to give is this:

string a = "h";
string b = a;
b += "ello";

The point being that a would, I believe, still point to "h"; i.e., the assignment to b doesn't modify the object it was pointing to, it creates a new object and changes b to point to it.

(I don't actually write C# but this is my understanding.)

Dave Costa
Bingo - that's what I thought and I agree with your suggested example.
A: 

I don't know what C# does, but I did read about this in Java, and an implementation based on Java would be more like this:

string b = "h" ;

b = (new StringBuilder(b)).Append("ello").ToString() ;

The point is that the "+" or "Append" does not exist for string because string is immutable.

SeaDrive
+ exists for Strings in Java. Unless they removed it after 1.5. What it does is use a StringBuilder to append the strings, but the operator does exist.
Dave Costa
"+" creates a new object each time is used while StringBuilder creates only one.String a = "He"; a += "llo"; a += " "; a += "wo" ; a += "rld"; Creates 9 different StringsAnd builder.append( "He" ).append("llo").append(" ").append("wo").append("rld"); Creates only 6. Put that on a for
OscarRyz
Create a question and I'll expand.
OscarRyz
A: 

Try this:

string b = "h";
string c = b + "ello";    // b still == "h", c = "hello"
string d = string.concat(b, "ello"); // d == hello, b still "h"

Why b is still "h" ? Because "b" is not an object, it is an object reference. There is nothing you can do to the object referenced by b to change it. If strings where mutable then using:

string b = "ello";
string f = b.Insert("h",0);

would modify b to "hello" ( because h was inserted at position 0 ) but as it is inmutable b remains "ello".

If you change the reference to other object that a different thing.

b = "ello";
b = "Some other string";
// b not references "Some other string" , but the object "ello" remains unchanged.

I hope it helps ( and works :S )

OscarRyz
A: 

Simply put, strings cannot be modified in place (if string is an array of characters)

shahkalpesh