tags:

views:

199

answers:

5

Hi all,

Is it because of string pooling by CLR or by the GetHashCode() method of both strings return same value?

string s1 = "xyz";
string s2 = "xyz";
Console.WriteLine(" s1 reference equals s2 : {0}", object.ReferenceEquals(s1, s2));

Console writes : "s1 reference equals s2 : True"

I believe that, it's not because of the GetHashCode() returns same value for both string instance. Because, I tested with custom object and overridden the GetHasCode() method to return a single constant every time. The two separate instances of this object does not equal in the reference.

Please let me know, what is happening behind the scene.

thanks 123Developer

+12  A: 

It sounds like string interning - a method of storing only one copy of a string. It requires strings to be an immutable type in the language you are dealing with, and .Net satisfies that and uses string interning.

In string interning a string "xyz" is stored in the intern pool, and whenever you say "xyz" internally it references the entry in the pool. This can save space by only storing the string once. So a comparison of "xyz" == "xyz" will get interpreted as [pointer to 34576] == [pointer to 34576] which is true.

Tom Ritter
Beat me by 32 seconds.
Joel Coehoorn
+1 Perhaps it would be helpful to quote the article a bit in your answer.
Andrew Hare
+6  A: 

This is definitely due to string interning. Hash codes are never calculated when comparing references with object.ReferenceEquals.

From the C# spec, section 2.4.4.5:

Each string literal does not necessarily result in a new string instance. When two or more string literals that are equivalent according to the string equality operator (§7.9.7) appear in the same program, these string literals refer to the same string instance.

Note that string constant expressions count as literals in this case, so:

string x = "a" + "b";
string y = "ab";

It's guaranteed that x and y refer to the same object too (i.e. they are the same references).

When the spec says "program" by the way, it really means "assembly". The behaviour of equal strings in different assemblies depends on things like CompilationRelaxations.NoStringInterning and the precise CLR implementation and execution time situation (e.g. whether the assembly is ngen'd or not).

Jon Skeet
@Jonstring x = "a" + "b"; is nothing but string x = "ab"; It would be happened at the compile time itself, right? Can we say that the former syntax exist just for more readability?
123Developer
@123Developer: It's more than just readability - the "a" might actually be SomeClass.StringConstant which would be pretty poor to copy in source code. But yes, they are exactly equivalent. The concatenation is performed at compile-time.
Jon Skeet
A: 

Totally agree with Tom's answer...

Excerpt from CIL Specification (page 126):

The CLI guarantees that the result of two ldstr instructions referring to two metadata tokens that have the same sequence of characters, return precisely the same string object (a process known as “string interning”).

CMS
+4  A: 

It's similar to string pooling, but it's not done at runtime but at compile time.

Any string literal in an assembly only exists once. The compiler uses the same constant string for all occurances of the string literal "xyz". As strings are immutable (you can never change the value of a string instance), the compiler can safely use the same string instance for separate string references.

If you instead create a string at runtime, you get a separate instance:

string s1 = "xyz";

string s2 = "xy";
s2 += "z";

Console.WriteLine("s1 ref = s2 : {0}", object.ReferenceEquals(s1, s2));

Output:

s1 ref = s2 : False
Guffa
A: 

string interning has nothing to do with it.

I would be very surprise to find up that .NET/C# compiler calls Intern implicitly, It takes too much stress on the CPU to check for matching string at runtime.

Shay Erlichmen
You are mistaken. Operators are not virtual. The object.ReferenceEquals method will not call operator==(string,string) but operator==(object,object).
Guffa