views:

394

answers:

5

I am reading a csv file that has about 50,000 lines and 1.1MiB in size (and can grow larger).

In Code1, I use String to process the csv, while in Code2 I use StringBuilder (only one thread executes the code, so no concurrency issues)

Using StringBuilder makes the code a little bit harder to read that using normal String class.

Am I prematurely optimizing things with StringBuilder in Code2 to save a bit of heap space and memory?

Code1

            fr = new FileReader(file);
            BufferedReader reader = new BufferedReader(fr);

            String line = reader.readLine();
                while ( line != null )
                {
                    int separator = line.indexOf(',');
                    String symbol = line.substring(0, seperator);
                    int begin = separator;
                    separator = line.indexOf(',', begin+1);
                    String price = line.substring(begin+1, seperator);

                    // Publish this update
                    publisher.publishQuote(symbol, price);

                    // Read the next line of fake update data
                    line = reader.readLine();
                 }

Code2

                    fr = new FileReader(file);
                    StringBuilder stringBuilder = new StringBuilder(reader.readLine());

                while( stringBuilder.toString() != null ) {
                    int separator = stringBuilder.toString().indexOf(',');
                    String symbol = stringBuilder.toString().substring(0, separator);
                    int begin = separator;
                    separator = stringBuilder.toString().indexOf(',', begin+1);
                    String price = stringBuilder.toString().substring(begin+1, separator);
                    publisher.publishQuote(symbol, price);

                    stringBuilder.replace(0, stringBuilder.length(), reader.readLine());
                }

Edit

I eliminated the toString() call, so there will be less string objects produced.

Code3

while( stringBuilder.length() > 0 ) {
                    int separator = stringBuilder.indexOf(",");
                    String symbol = stringBuilder.substring(0, separator);
                    int begin = separator;
                    separator = stringBuilder.indexOf(",", begin+1);
                    String price = stringBuilder.substring(begin+1, separator);
                    publisher.publishQuote(symbol, price);
                    Thread.sleep(10);
                    stringBuilder.replace(0, stringBuilder.length(), reader.readLine());
                }

Also, the original code is downloaded from http://www.devx.com/Java/Article/35246/0/page/1

+1  A: 

Code2 is actually less efficient than Code1 because every time you call stringBuilder.toString() you're creating a new java.lang.String instance (in addition to the existing StringBuilder object). This is less efficient in terms of space and time due to the object creation overhead.

Assigning the contents of readLine() directly to a String and then splitting that String will typically be performant enough. You could also consider using the Scanner class.

Memory Saving Tip

If you encounter multiple repeating tokens in your input consider using String.intern() to ensure that each identical token references the same String object; e.g.

String[] tokens = parseTokens(line);
for (String token : tokens) {
  // Construct business object referencing interned version of token.
  BusinessObject bo = new BusinessObject(token.intern());
  // Add business object to collection, etc.
}
Adamski
+1 Do note though, @portoalet, that string interning is not guarenteed to yield any additional performance. Do some profiling (at least with a stop watch) to check if it helps in your case or not :)
Jørn Schou-Rode
+2  A: 

Am I prematurely optimizing things with StringBuilder in Code2 to save a bit of heap space and memory?

Most probably: yes. But, only one way to find out: profile your code.

Also, I'd use a proper CSV parser instead of what you're doing now: http://ostermiller.org/utils/CSV.html

Bart Kiers
+1 for recommending a real parser. The given code fails with quotes and commas as actual values.
BalusC
+1 I am using ostermiller csv/excelcsv parser/printer in production environment, it is very nice.
Michael Konietzka
+3  A: 

Will the optimized code increase performance of the app? - my question

The second code sample will not save you any memory nor any computation time. I am afraid you might have misunderstood the purpose of StringBuilder, which is really meant for building strings - not reading them.

Within the loop or your second code sample, every single line contains the expression stringBuilder.toString(), essentially turning the buffered string into a String object over and over again. Your actual string operations are done against these objects. Not only is the first code sample easier to read, but it is most certainly as performant of the two.

Am I prematurely optimizing things with StringBuilder? - your question

Unless you have profiled your application and have come to the conclusion that these very lines causes a notable slowdown on the execution speed, yes. Unless you are really sure that something will be slow (eg if you recognize high computational complexity), you definately want to do some profiling before you start making optimizations that hurt the readability of your code.

What kind of optimizations could be done to this code? - my question

If you have profiled the application, and decided this is the right place for an optimization, you should consider looking into the features offered by the Scanner class. Actually, this might both give you better performance (profiling will tell you if this is true) and more simple code.

Jørn Schou-Rode
+1, the code seems to be misusing `StringBuilder`Implementation(1) is cleaner and possibly better.
ring bearer
@Jørn In code3 above I have removed the stringBuilder.toString() call, and this should reduced the number of string objects created.The reason code1 uses the functionality in String class instead of csv parser is to minimize the number of objects in the heap, as this code runs on java real-time vm http://www.devx.com/Java/Article/35246/0/page/1Based on your experience, will Scanner class perform better?
portoalet
@por: My experience with performance critical string operations in Java is not very thorough. If you really think it matters, you could try both in a tight loop and measure the throughput.
Jørn Schou-Rode
A: 

StringBuilder is usually used like this:

StringBuilder sb = new StringBuilder();
sb.append("You").append(" can chain ")
  .append(" your ").append(" strings ")
  .append("for better readability.");

String myString = sb.toString(); // only call once when you are done
System.out.prinln(sb); // also calls sb.toString().. print myString instead
armandino
A: 

StringBuilder has several good things

  • StringBuffer's operations are synchronized but StringBuilder is not, so using StringBuilder will improve performance in single threaded scenarios
  • Once the buffer is expanded the buffer can be reused by invoking setLength(0) on the object. Interestingly if you step into the debugger and examine the contents of StringBuilder you will see that contents are still exists even after invoking setLength(0). The JVM simply resets the pointer beginning of the string. Next time when you start appending the chars the pointer moves
  • If you are not really sure about length of string, it is better to use StringBuilder because once the buffer is expanded you can reuse the same buffer for smaller or equal size

StringBuffer and StringBuilder are almost same in all operations except that StringBuffer is synchronized and StringBuilder is not

If you dont have multithreading then it is better to use StringBuilder

webjockey