tags:

views:

202

answers:

6

I was perplexed after executing this piece of code, where strings seems to behave as if they are value types. I am wondering whether the assignment operator is operating on values like equality operator for strings.

Here is the piece of code I did to test this behavior.

using System;

namespace RefTypeDelimma
{
    class Program
    {
        static void Main(string[] args)
        {
            string a1, a2;

            a1 = "ABC";
            a2 = a1; //This should assign a1 reference to a2
            a2 = "XYZ";  //I expect this should change the a1 value to "XYZ"

            Console.WriteLine("a1:" + a1 + ", a2:" + a2);//Outputs a1:ABC, a2:XYZ
            //Expected: a1:XYZ, a2:XYZ (as string being a ref type)

            Proc(a2); //Altering values of ref types inside a procedure 
                      //should reflect in the variable thats being passed into

            Console.WriteLine("a1: " + a1 + ", a2: " + a2); //Outputs a1:ABC, a2:XYZ
            //Expected: a1:NEW_VAL, a2:NEW_VAL (as string being a ref type)
        }

        static void Proc(string Val)
        {
            Val = "NEW_VAL";
        }
    }
}

In the above code if I use a custom classes instead of strings, I am getting the expected behavior. I doubt is this something to do with the string immutability?

welcoming expert views on this.

+7  A: 

They don't. You changed the pointer of a2, not the object it pointed to.
When you are using classes and getting your expected behavior, you must be setting a property of the object, not its reference.

Any other class will behave the same:

Foo a = new Foo(1);
Foo b = a; //a, b point to the same object

b.Value = 4; // change property
Assert.Equals(a.Value, 4); //true - changed for a

b = new Foo(600); // new reference for b
Assert.Equals(a.Value, 4); //true
Assert.Equals(b.Value, 600); //true
Kobi
+8  A: 

Whenever you see

variableName = someValue;

that's changing the value of the variable - it's not changing the contents of the object that variable's value refers to.

This behaviour of string is entirely consistent with other reference types, and has nothing to do with immutability. For example:

StringBuilder b1 = new StringBuilder("first");
StringBuilder b2 = b1;
b2 = new StringBuilder("second");

That last line doesn't change anything about b1 - it doesn't change which object it refers to, or the contents of the object it refers to. It just makes b2 refer to a new StringBuilder.

The only "surprise" here is that strings have special support in the language in the form of literals. While there are important details such as string interning (such that the same string constant appearing in multiple places within the same assembly will always yield references to the same object) this doesn't affect the meaning of the assignment operator.

Jon Skeet
+16  A: 

You're not changing anything about the object a1 points to, but instead changing which object a1 points to.

a = new Person(); b = a; b = new Person();

Your example replaces "new Person { … }" with a string literal, but the principle is the same.

The difference comes when you're changing properties of the object. Change the property of a value type, and it's not reflected in the original.

a = new Person(); b = a; b.Name = …;

Change the property of a reference type, and it is reflected in the original.

a = new Person(); b = a; b.Name = …;

p.s. Sorry about the size of the images, they're just from something I had lying around. You can see the full set at http://dev.morethannothing.co.uk/valuevsreference/, which covers value types, reference types, and passing value types by value and by reference, and passing reference types by value and by reference.

ICR
Good answer, nicely done :)
Russ C
+1 for visual aids. :)
cHao
+1 for picture :)
PoweRoy
As I've seen lots of people get confused by this, I've used my pictures to create a blog post that, hopefully, makes things a bit clearer. It can be found at http://www.morethannothing.co.uk/2010/05/value-ref-types-pass-by-value-ref/
ICR
+3  A: 
   a2 = "XYZ";

That's syntax sugar, provided by the compiler. A more accurate representation of this statement would be:

   a2 = CreateStringObjectFromLiteral("XYZ")

which explains how a2 simply gets a reference to a new string object and answers your question. The actual code is highly optimized because it is so common. There's a dedicated opcode available for it in IL:

   IL_0000:  ldstr      "XYZ"

String literals are collected into a table inside the assembly. Which allows the JIT compiler to implement the assignment statement very efficiently:

   00000004  mov         esi,dword ptr ds:[02A02088h] 

A single machine code instruction, can't beat that. More so: one very notable consequence is that the string object doesn't live on the heap. The garbage collector doesn't bother with it since it recognizes that the address of the string reference isn't located in the heap. So you don't even pay for collection overhead. Can't beat that.

Also note that this scheme easily allows for string interning. The compiler simply generates the same LDSTR argument for an identical literal.

Hans Passant
How does this answer the question? The OP simply hasn’t understood how reference types work.
Konrad Rudolph
@Konrad, the OP looks happy to me. I'd guess he completely understands reference types and couldn't figure out how a reference is generated when assigning a literal. It isn't obvious.
Hans Passant
+1000. This is the first answer I've seen to this common question which appears to really comprehend that the source of the confusion is the semantics of strings, and explain it from that angle. And I've seen answers from some pretty significant people! Thank you.
Charles
Awesome, thank you. 997 to go :)
Hans Passant
@Hans Passant: Nothing against your answer but this has literally got nothing to do with the OP’s question. I think the fact that this answer is accepted shows how little the OP has actually understood the issue.
Konrad Rudolph
@Konrad: I have no idea how to make you happy. And don't see the point in establishing the OP is a fool. I really don't think he is. Sorry. Maybe you can post a better answer?
Hans Passant
@Hans Passant: ICR has posted a good answer. So has Jon. And once again, I think your answer is a good technical explanation of an aspect of string handling, it just doesn’t pertain to this question (which actually has got nothing to do with strings). And I didn’t call the OP a fool. He just doesn’t understand how references work. How does that make him a fool?
Konrad Rudolph
@Konrad Rudolph: I believe Hans has truly understood the source of the OP's surprise, while Jon hasn't. For string variables, the assignment operator provides a certain syntax sugar, but people (such as the OP) tend to assume it provides a *different* syntax sugar. To wit: the OP assumed that `a2 = "XYZ"` was syntax sugar for `a2.Value = "XYZ"`. Hans explained that it's actually syntax sugar for `a2 = CreateStringObjectFromLiteral("XYZ")`. These are two very different statements and their difference is the exact reason for the confusion.
Charles
This IS a failure to understand how strings work. This is NOT a failure to understand how reference types work. Arriving at that interpretation of the question is the reason Jon's answer missed the point, and numerous other brilliant minds have made the same mistake (on similar questions).
Charles
Folks, thank you for your expert inputs to my dilemma on string assignment. What made me flag for Hans Passat's response was the statement "That's syntax sugar, provided by the compiler. A more accurate representation of this statement would be: a2 = CreateStringObjectFromLiteral("XYZ")"After going through Passat's reponse, it was easier to comprehend the ICRs and Jon Skeet's response. Thank you once again for your valuable inputs.
AJP
A: 

If I remember correctly once Erik Lippert wrote on SO that this behavior was chosen so that multihreading is easier and more secure. This way when you store a string in a1, you know that only you can change it. It cannot be changed from other threads for example.

Petar Repac
A: 

The first reason is String is an immutable class.

An object qualifies as being called immutable if its value cannot be modified once it has been created. For example, methods that appear to modify a String actually return a new String containing the modification. Developers are modifying strings all the time in their code. This may appear to the developer as mutable - but it is not. What actually happens is your string variable/object has been changed to reference a new string value containing the results of your new string value. For this very reason .NET has the System.Text.StringBuilder class. If you find it necessary to modify the actual contents of a string-like object heavily, such as in a for or foreach loop, use the System.Text.StringBuilder class.


For example:

string x= 123 ;

if you do x= x + abc what it does is it assigns new memory location for 123 and abc. Then adds the two strings and places the computed results in new memory location and points x to it.

if you use System.Text.StringBuilder sb new System.Text.StringBuilder( 123 ); sb.Append( abc ); x sb.ToString();

stringbuilder is mutable class. It just adds the string to same memory location. This way string manipulation is faster.


A string is an object of type String whose value is text. Internally, the text is stored as a readonly collection of Char objects, each of which represents one Unicode character encoded in UTF-16.

kbm4100