views:

531

answers:

9

Does string immutability work by statement, or by strings within a statement?

For example, I understand that the following code will allocate two strings on the heap.

string s = "hello ";
s += "world!";

"hello" will remain on the heap until garbage collected; and s now references "hello world!" on the heap. However, how many strings does the following line allocate on the heap...1 or 2? Also, is there a tool/way to verify the results?

string s = "goodbye " + "cruel world!";
+18  A: 

The compiler has special treatment for string concatenation, which is why the second example is only ever one string. And "interning" means that even if you run this line 20000 times there is still only 1 string.

Re testing the results... the easiest way (in this case) is probably to look in reflector:

.method private hidebysig static void Main() cil managed
{
    .entrypoint
    .maxstack 1
    .locals init (
        [0] string s)
    L_0000: ldstr "goodbye cruel world!"
    L_0005: stloc.0 
    L_0006: ldloc.0 
    L_0007: call void [mscorlib]System.Console::WriteLine(string)
    L_000c: ret 
}

As you can see (ldstr), the compiler has done this for you already.

Marc Gravell
To be fair: in this case both parts of the string are known at compile time. If any part of that had to wait until run-time you'd see some very different IL.
Joel Coehoorn
@Joel - yes, but that was the question.
Marc Gravell
A: 

If the compiler is "intelligent", it will only be one string with "goodbye cruel world!"

Burkhard
It is, and it does. See IL in my reply.
Marc Gravell
Also do a Google for Intern Pool
KiwiBastard
A: 

Actually, probably 3. a const string for "goodbye", a const string for "cruel world", and then a new string for the result.

You can find out for sure by looking at the generated code. It depends on the compiler, (and, in fact, on the language, this isn't obvious) but you can read the output of g++ by using the -a flag (I think, check the man page) to get the intermediate code.

Charlie Martin
It is .net he's asking for.
A: 

Don't trust what you "Know" about strings. You might look through the source code for the implementation of string. For instance your example:

string s = "goodbye " + "cruel world!";

In java would allocate a single string. Java plays some pretty cute tricks and would be hard to outsmart--just never optimize until you need to!

Currently however, as far as I know, using this:

String s="";
for(int i=0;i<1000;i++)
    s+=" ";

to create a 1000 space string still tends to be extremely inefficient

Appending in a loop is pretty bad, but otherwise it's probably as efficient as StringBuilder.

Bill K
that is a pretty big "otherwise"... StringBuilder will use doubling, so < 10 resizes, rather than 1000 copies (telescoping).
Marc Gravell
Right, so for now, avoid appending to strings in large loops, but otherwise don't stress about it. Even at that, for most code I wouldn't worry about it until it started to effect performance.
Bill K
+2  A: 

Literal strings are interned this means that "hello " does not reside on the heap but in the data segment [see comment] of the progam (and is thus not eligible for garbage collection), same goes for "world", as for "hello world" that may be also interned, if the compiler is smart enough.

"goodbye cruel world" will be interned since string literal concatenation is something treated by the compiler.


Edit: I'm not sure about the data segment statement, please see this question for more information.

Motti
Interned strings are actually on the heap just as every other reference type in .NET.
Brian Rasmussen
A: 

Be careful here, because the compiler can make some very different optimizations when the string values are known at compile time. If the strings you're using aren't known until runtime (pulled from a config file, database, or user input) you'll see some very different IL.

Joel Coehoorn
A: 

If you're just going to do one or two string concatenations I wouldn't worry about it.

However if you have lot of concatenations, or you have a loop, then you definitely want to take precautions. In the Java world that means you use StringBuffer insteads of concatenating string.

Stephane Grenier
It's called a StringBuilder in .NET
Mark Cidade
Thanks marxidad. I figured there was something similar, there so close on the core libraries.
Stephane Grenier
A: 

If it's not just in one line,the concatenation of two strings may be accomplished by making the first string into a StringBuffer, doing the concatenation, and returning the result string.

Creating the StringBuffer yourself may seem like overkill, but that's what is going to happen anyway.-

SeaDrive
I think you mean StringBuilder
Richard Szalay
A: 

By all means don't prematurely optimise, but don't discount how badly performant string concatonations can be. It's not the object creation, but the GC work that it causes.

There is a lab on (ASP.NET Escalation Engineer) Tess Ferrnandez's blog that show's a (rather extreme, granted) example of how string concatonation can bring a server to its knees.

Richard Szalay