views:

9872

answers:

8

We have to build Strings all the time for log output and so on. Over the JDK versions we have learned when to use StringBuffer (many appends, thread safe) and StringBuilder (many appends, non-thread-safe).

What's the advice on using String.format? Is it efficient, or are we forced to stick with concatenation for one-liners where performance is important?

e.g. ugly old style,

String s = "What do you get if you multiply " + varSix + " by " + varNine + "?");

vs. tidy new style (and possibly slow),

String s = String.format("What do you get if you multiply %d by %d?", varSix, varNine);

Note: my specific use case is the hundreds of 'one-liner' log strings throughout my code. They don't involve a loop, so StringBuilder is too heavyweight. I'm interested in String.format specifically.

A: 

According to Jeff's recent blog post, the performance isn't too drastically different. Most people have commented that readability is more important for deciding the method of string concatenation.

Edit: However, as pointed out by others, you should benchmark the performance in Java to see if there is a true difference.

HardCode
That's .NET string formatting. You'd have to benchmark the Java version to know what the performance of *that* is like.
Jon Skeet
This is advice for the wrong language. Actually very misleading.
Air
-1 Why are you posting a .Net related article? You should at least mention that it is a cautionary tale or a general principle rather than implying it addresses the topic.
cletus
And apart from the language issue, fundamentally wrong and dangerous advice. The chance of a 20x performance issue being a user-visible defect is at least 1%.Code that works only 99% of the time is buggy, and such erratic issues are the worst type of bugs, the ones that will get through your unit tests and hurt your bottom line.
soru
+2  A: 

Generally you should use String.Format because it's relatively fast and it supports globalization (assuming you're actually trying to write something that is read by the user). It also makes it easier to globalize if you're trying to translate one string versus 3 or more per statement (especially for languages that have drastically different grammatical structures).

Now if you never plan on translating anything, then either rely on Java's built in conversion of + operators into StringBuilder. Or use Java's StringBuilder explicitly.

Orion Adrian
A: 

The answer to this depends very much on how your specific Java compiler optimizes the bytecode it generates. Strings are immutable and, theoretically, each "+" operation can create a new one. But, your compiler almost certainly optimizes away interim steps in building long strings. It's entirely possible that both lines of code above generate the exact same bytecode.

The only real way to know is to test the code iteratively in your current environment. Write a QD app that concatenates strings both ways iteratively and see how they time out against each other.

Jekke
The bytecode for the second example *surely* calls String.format, but I'd be horrified if a simple concatenation did. Why would the compiler use a format string which would then have to be parsed?
Jon Skeet
I used "bytecode" where I should have said "binary code." When it all comes down to jmps and movs, it may well be the exact same code.
Jekke
+2  A: 

In your example, performance probalby isn't too different but there are other issues to consider: namely memory fragmentation. Even concatenate operation is creating a new string, even if its temporary (it takes time to GC it and it's more work). String.format() is just more readable and it involves less fragmentation.

Also, if you're using a particular format a lot, don't forget you can use the Formatter() class directly (all String.format() does is instantiate a one use Formatter instance).

See this related question Is String.Format as efficient as StringBuilder?.

Also, something else you should be aware of: be careful of using substring(). For example:

String getSmallString() {
  String largeString = // load from file; say 2M in size
  return largeString.substring(100, 300);
}

That large string is still in memory because that's just how Java substrings work. A better version is:

  return new String(largeString.substring(100, 300));

or

  return String.format("%s", largeString.substring(100, 300));

The second form is probably more useful if you're doing other stuff at the same time.

cletus
Worth pointing out the "related question" is actually C# and hence not applicable.
Air
+3  A: 

I wrote a small class to test which has the better performance of the two and + comes ahead of format. by a factor of 5 to 6. Try it your self

import java.io.*;
import java.util.Date;

public class StringTest{

    public static void main( String[] args ){
    int i = 0;
    long prev_time = System.currentTimeMillis();
    long time;

    for( i = 0; i< 100000; i++){
        String s = "Blah" + i + "Blah";
    }
    time = System.currentTimeMillis() - prev_time;

    System.out.println("Time after for loop " + time);

    prev_time = System.currentTimeMillis();
    for( i = 0; i<100000; i++){
        String s = String.format("Blah %d Blah", i);
    }
    time = System.currentTimeMillis() - prev_time;
    System.out.println("Time after for loop " + time);

    }
}
hhafez
There's one flaw with this test in that it's not entirely a good representation of all string formatting. Often there's logic involved in what to include and logic to format specific values into strings. Any real test should look at real-world scenarios.
Orion Adrian
+1  A: 

I just modified hhafez's test to include StringBuilder. StringBuilder is 33 times faster than String.format using jdk 1.6.0_10 client on XP. Using the -server switch lowers the factor to 20.

public class StringTest {

   public static void main( String[] args ) {
      test();
      test();
   }

   private static void test() {
      int i = 0;
      long prev_time = System.currentTimeMillis();
      long time;

      for ( i = 0; i < 1000000; i++ ) {
         String s = "Blah" + i + "Blah";
      }
      time = System.currentTimeMillis() - prev_time;

      System.out.println("Time after for loop " + time);

      prev_time = System.currentTimeMillis();
      for ( i = 0; i < 1000000; i++ ) {
         String s = String.format("Blah %d Blah", i);
      }
      time = System.currentTimeMillis() - prev_time;
      System.out.println("Time after for loop " + time);

      prev_time = System.currentTimeMillis();
      for ( i = 0; i < 1000000; i++ ) {
         new StringBuilder("Blah").append(i).append("Blah");
      }
      time = System.currentTimeMillis() - prev_time;
      System.out.println("Time after for loop " + time);
   }
}

While this might sound drastic, I consider it to be relevant only in rare cases, because the absolute numbers are pretty low: 4 s for 1 million simple String.format calls is sort of ok - as long as I use them for logging or the like.

Update: As pointed out by sjbotha in the comments, the StringBuilder test is invalid, since it is missing a final .toString().

The correct speed-up factor from String.format(.) to StringBuilder is 23 on my machine (16 with the -server switch).

the.duckman
Your test is invalid because it fails to take into account the time eaten up by just having a loop. You should include that and subtract it from all the other results, at a minimum (yes it can be a significant percentage).
cletus
I did that, the for loop takes 0 ms. But even if it did take time, this would only increase the factor.
the.duckman
The StringBuilder test is invalid because it does not call toString() at the end to actually give you a String you can use. I added this and the result is that StringBuilder takes about the same amount of time as +. I'm sure as you increase the number of appends it will eventually become cheaper.
sjbotha
+5  A: 

To expand/correct on the first answer above, it's not translation that String.format would help with, actually.
What String.format will help with is when you're printing a date/time (or a numeric format, etc), where there are localization(l10n) differences (ie, some countries will print 04Feb2009 and others will print Feb042009).
With translation, you're just talking about moving any externalizable strings (like error messages and what-not) into a property bundle so that you can use the right bundle for the right language, using ResourceBundle and MessageFormat.

Looking at all the above, I'd say that performance-wise, String.format vs. plain concatenation comes down to what you prefer. If you prefer looking at calls to .format over concatenation, then by all means, go with that.
After all, code is read a lot more than it's written.

dw.mackie
+5  A: 

hi I took hhafez code and added a memory test:

private static void test() {
    Runtime runtime = Runtime.getRuntime();
    long memory;
    ...
    memory = runtime.freeMemory();
    // for loop code
    memory = memory-runtime.freeMemory();

I run this separately for each approach, the '+' operator, String.format and StringBuilder (calling toString()), so the memory used will not be affected by other approaches. I added more concatenations, making the string as "Blah" + i + "Blah"+ i +"Blah" + i + "Blah".

The result are as follow (average of 5 runs each):
Approach       Time(ms)  Memory allocated (long)
'+' operator     747           320,504
String.format  16484       373,312
StringBuilder  769           57,344

We can see that String '+' and StringBuilder are practically identical time-wise, but StringBuilder is much more efficient in memory use. This is very important when we have many log calls (or any other statements involving strings) in a time interval short enough so the Garbage Collector won't get to clean the many string instances resulting of the '+' operator.

And a note, BTW, don't forget to check the logging level before constructing the message.

Conclusions:

  1. I'll keep on using StringBuilder.
  2. I have too much time or too little life.

Itamar

Itamar