views:

482

answers:

8

I have often wondered this, is there a performance cost of splitting a string over multiple lines to increase readability when initially assigning a value to a string. I know that strings are immutable and therefore a new string needs to be created every time. Also, the performance cost is actually irrelevant thanks to today's really fast hardware (unless you are in some diabolical loop). So for example:

String newString = "This is a really long long long long long" +
    " long long long long long long long long long long long long " +
    " long long long long long long long long long string for example.";

How does the JVM or .Net's compiler and other optimizations handle this. Will it create a single string? Or will it create 1 string then a new concatenating the value and then another one concatenating the values again?

This is for my own curiosity.

+3  A: 

As far as I can remember, this will not create multiple strings, just the one.

ck
+25  A: 

This is guaranteed by the C# spec to be identical to creating the string in a single literal, because it's a compile-time constant. From section 7.18 of the C# 3 spec:

Whenever an expression fulfills the requirements listed above, the expression is evaluated at compile-time. This is true even if the expression is a sub-expression of a larger expression that contains non-constant constructs.

(See the spec for the exact details of "the requirements listed above" :)

The Java Language Specification specifies it near the bottom of section 3.10.5:

Strings computed by constant expressions (§15.28) are computed at compile time and then treated as if they were literals.

Jon Skeet
Thanks Jon. I thought as much and my curiosity was piqued. Thanks again.
uriDium
+5  A: 

No performance tradeoff. Compiler's optimization will merge that to a single string (at least in Java).

david a.
Same for the C# compiler.
Drew Noakes
+2  A: 

As long as all the strings are constant (as they are in your example), in Java (and I imagine C#) the compiler converts this to a single string.

You only get performance issues with + if you concatenate a lot of dynamic strings, such as in a loop. In this case use a StringBuilder or StringBuffer.

cletus
+13  A: 

Indeed, in Java, the compiler will turn the String into a constant.

class LongLongString
{
    public LongLongString()
    {
        String newString = "This is a really long long long long long" +
            " long long long long long long long long long long long long " +
            " long long long long long long long long long string for example.";
    }

    public static void main(String[] args)
    {
        new LongLongString();
    }
}

Is compiled into:

Compiled from "LongLongString.java"
class LongLongString extends java.lang.Object{
public LongLongString();
  Code:
   0:   aload_0
   1:   invokespecial #1; //Method java/lang/Object."<init>":()V
   4:   ldc #2; //String This is a really long long long long long long long long long long long long long long long long long  long long long long long long long long long string for example.
   6:   astore_1
   7:   return

public static void main(java.lang.String[]);
  Code:
   0:   new #3; //class LongLongString
   3:   dup
   4:   invokespecial #4; //Method "<init>":()V
   7:   pop
   8:   return

}

As can be seen, a single line is loaded in in line 4, rather than multiple String instances being loaded in.

Edit: The source file was compiled using javac version 1.6.0_06. Looking at The Java Language Specification, Third Edition, (and the same section mentioned in Jon Skeet's answer), I was not able to find any reference for whether a compiler should concatenate a multi-line String into a single String, so this behavior is probably compiler implementation-specific.

coobird
You can see what a particular version of a compiler does, but what about the spec? How do I *know*?
Tom Hawtin - tackline
+6  A: 

Test this for yourself. In C# code (equivalent Java would work too):

string x = "A" + "B" + "C";
string y = "ABC";

bool same = object.ReferenceEquals(x, y); // true

You will see that the result is true.

As an aside, you will see that the string is also interned in the runtime's string pool:

bool interned = object.ReferenceEquals(x, string.Intern(x)); // true
Drew Noakes
You can test, but what about the spec? How do I *know*?
Tom Hawtin - tackline
The first part (string reference equality) is governed by the C# compiler and is specified behaviour. However, the second part (interned string literal) is controlled by the CLR and is not guaranteed behaviour across all implementations and platforms.
Drew Noakes
+3  A: 

The equivalent .NET IL to complement coobird's answer:

For C# code:

string s = "This is a really long long long long long" +
    " long long long long long long long long long long long long " +
    " long long long long long long long long long string for example.";
Console.WriteLine(s);

A debug compilation produces:

.method public hidebysig static void Main(string[] args) cil managed
{
  .custom instance void [mscorlib]System.STAThreadAttribute::.ctor()
  .maxstack 1
  .locals init (
      [0] string str)
  L_0000: ldstr "This is a really long long long long long long long long long long long long long long long long long  long long long long long long long long long string for example."
  L_0005: stloc.0 
  L_0006: ldloc.0 
  L_0007: call void [mscorlib]System.Console::WriteLine(string)
  L_000c: ret 
}

So, as you can see, it's one string.

Drew Noakes
I can see it, but do I always know that is going to happen? (Well, I do if I look at someone else's answer.)
Tom Hawtin - tackline
Fair point. In this case though, this is specified behaviour so all implementations of the C# compiler are *supposed* to work this way.
Drew Noakes
A: 

Disclaimer: This is true for Java. I would assume its true for c#

Not only will javac create a single String but the JVM will use one String for all other String which contain the same text.

String a = "He" + "llo th"+ "ere";
String b = "Hell" + "o the"+ "re";
String c = "Hello" +" "+"there";
assert a == b; // these are the same String object.
assert a == c; // these are the same String object.

Note: they will be the same String object at runtime even if they are in different classes in different JARS, compiled by different compilers.

Peter Lawrey