tags:

views:

200

answers:

5
+5  Q: 

String interning?

The second ReferenceEquals call returns false. Why isn't the string in s4 interned?

string s1 = "tom";
string s2 = "tom";


Console.Write(object.ReferenceEquals(s2, s1)); //true

string s3 = "tom";
string s4 = "to";
s4 += "m";

Console.Write(object.ReferenceEquals(s3, s4)); //false

--edit--

To add to the question, this discussion is about string interning and references. Not about the advantages of StringBuilder over string concatenation. Thanks.

--edit2-- Just noticed that when I do String.Intern(s4);, I still get false.

--edit3-- Getting weird now. Both s3 and s4 are interned but their references are not equal? Help!

string s3 = "tom";
string s4 = "to";
s4 += "m";
String.Intern(s4);

Console.WriteLine(s3 == s4); //true
Console.WriteLine(object.ReferenceEquals(s3, s4)); //false
Console.WriteLine(string.IsInterned(s3) != null);  //true (s3 is interned)
Console.WriteLine(string.IsInterned(s4) != null);  //true (s4 is interned)
+3  A: 

Strings are immutable. This means their contents can't be changed.

When you do s4 += "m"; internally, the CLR copies the string to another location in memory which contains the original string and the appended part.

Edit: see MSDN string reference

Jim Schubert
I understand strings are immutable. But the whole point of interning is to save memory right? Why can't the CLR say, hey I have this same value in my intern pool, I am just going to point to it.
rkrauter
@rkrauter: it's quite expensive to check all the strings in the intern pool whether any of them is equal to the result of the operation -- after each of operations! So the CLR sacrifices the memory efficiency to the execution speed. The string calculation at the compile time may be slow, so its results can be interned. The calculations at runtime must be fast, so checking each result against a series of other strings seems to be impracticable.
Vlad
So string interning is primarily done at compile time? Just noticed that when I do String.Intern(s4);, I still get false. Please explain.
rkrauter
+11  A: 

The string in s4 is interred. However, when you execute s4 += "m";, you have created a new string that will not be interred as it's value is not a string literal but the result of a string concatenation operation. As a result, s3 and s4 are two different string instances in two different memory locations.

For more information on string interning, look here, specifically at the last example. When you do String.Intern(s4), you are indeed interring the string, but you are still not performing a reference equality test between those two interred strings. The String.Intern method returns the interred string, so you would need to do this:

string s1 = "tom";
string s2 = "tom";

Console.Write(object.ReferenceEquals(s2, s1)); //true 

string s3 = "tom";
string s4 = "to";
s4 += "m";

Console.Write(object.ReferenceEquals(s3, s4)); //false

string s5 = String.Intern(s4);

Console.Write(object.ReferenceEquals(s3, s5)); //true
Scott Dorman
Marked as answer thanks. Still weird stuff. I tell it to intern s4 and it returns a reference to an already interned string in the pool. While poor s4 simply hangs out as a non interned string in the heap.
rkrauter
Thanks. It returns a reference, but if the string value passed as the argument isn't already interned, it will be interned and then the reference is returned. String interning is really an optimization technique predominately used by the compiler to reduce the number of string instances across the application. Unless you are creating **a lot** of string you probably won't see much benefit doing it yourself.
Scott Dorman
+1  A: 

In C#, each string is a distinct object, and cannot be edited. You are creating references to them, but each string is distinct. The behaviour is consistent and easy to understand.

Might I suggest examining the StringBuilder class for manipulating strings without creating new instances? It should be sufficient for anything you want to do with strings.

SLC
Only use a StringBuilder if you need to concat large strings together. In all other cases the time saved is so minimal it doesn't matter.Also, strings are put into an array of already created strings internally, this means, if you create the string "Hello", any further string "Hello" will point to the same reference in memory.
Femaref
+1  A: 

First of all, all words about immutable strings which are already written are correct. But there are some impotent things which are not written. The code

string s1 = "tom";
string s2 = "tom";
Console.Write(object.ReferenceEquals(s2, s1)); //true

display really "True", but only because of some small compiler optimization or like here because CLR ignore C# compiler attributes (see "CLR via C#" book) and place only one string "tom" in the heap.

Second you can fix the situation with following lines:

s3 = String.Intern(s3);
s4 = String.Intern(s4);
Console.Write (object.ReferenceEquals (s3, s4)); //true

Function String.Intern calculate a hash code of the string and search for the same hash in the internal hash table. Because it find this, it returns back the reference to already existing String object. If the string doesn't exist in the internal hash table, a copy of string is made and hash computed. Garbage collector don't free memory for the string, because it is referenced by hash table.

Oleg