views:

194

answers:

5

Am trying to create a well-optimised bit of code to create number of X-digits in length (where X is read from a runtime properties file), based on a DB-generated sequence number (Y), which is then used a folder-name when saving a file.

I've come up with three ideas so far, the fastest of which is the last one, but I'd appreciate any advice people may have on this...

1) Instantiate a StringBuilder with initial capacity X. Append Y. While length < X, insert a zero at pos zero.

2) Instantiate a StringBuilder with initial capacity X. While length < X, append a zero. Create a DecimalFormat based on StringBuilder value, and then format the number when it's needed.

3) Create a new int of Math.pow( 10, X ) and add Y. Use String.valueOf() on the new number and then substring(1) it.

The second one can obviously be split into outside-loop and inside-loop sections.

So, any tips? Using a for-loop of 10,000 iterations, I'm getting similar timings from the first two, and the third method is approximately ten-times faster. Does this seem correct?

Full test-method code below...

    // Setup test variables
    int numDigits = 9;
    int testNumber = 724;
    int numIterations = 10000;
    String folderHolder = null;
    DecimalFormat outputFormat = new DecimalFormat( "#,##0" );

    // StringBuilder test
    long before = System.nanoTime();
    for ( int i = 0; i < numIterations; i++ )
    {
        StringBuilder sb = new StringBuilder( numDigits );
        sb.append( testNumber );
        while ( sb.length() < numDigits )
        {
            sb.insert( 0, 0 );
        }

        folderHolder = sb.toString();
    }
    long after = System.nanoTime();
    System.out.println( "01: " + outputFormat.format( after - before ) + " nanoseconds" );
    System.out.println( "Sanity check: Folder = \"" + folderHolder + "\"" );

    // DecimalFormat test
    before = System.nanoTime();
    StringBuilder sb = new StringBuilder( numDigits );
    while ( sb.length() < numDigits )
    {
        sb.append( 0 );
    }
    DecimalFormat formatter = new DecimalFormat( sb.toString() );
    for ( int i = 0; i < numIterations; i++ )
    {
        folderHolder = formatter.format( testNumber );
    }
    after = System.nanoTime();
    System.out.println( "02: " + outputFormat.format( after - before ) + " nanoseconds" );
    System.out.println( "Sanity check: Folder = \"" + folderHolder + "\"" );

    // Substring test
    before = System.nanoTime();
    int baseNum = (int)Math.pow( 10, numDigits );
    for ( int i = 0; i < numIterations; i++ )
    {
        int newNum = baseNum + testNumber;
        folderHolder = String.valueOf( newNum ).substring( 1 );
    }
    after = System.nanoTime();
    System.out.println( "03: " + outputFormat.format( after - before ) + " nanoseconds" );
    System.out.println( "Sanity check: Folder = \"" + folderHolder + "\"" );
+1  A: 

Inserting padding characters one by one is obviously slow. If performance is really that big a concern, you could use predefined string constants of lengts 1..n-1 instead (where n is the biggest expected length), stored in an ArrayList at the corresponding indexes.

If n is very big, at least you could still insert in bigger chunks instead of single chars.

But overall, as others pointed out too, optimization is only feasible if you have profiled your application under real circumstances and found which specific piece of code is the bottleneck. Then you can focus on that (and of course profile again to verify that your changes actually improve performance).

Péter Török
+1  A: 

Use String.format("%0[length]d", i)

For length of 8 it would be

String out = String.format("%08d", i);

It's slower, but the time spent typing and debugging the more complex code will probably exceed the total extra time ever used during execution.

In fact, if you add up all the man-hours already spent discussing this, It most likely exceeds the execution time savings by a large factor.

Skip Head
So if I have length 5, I would use %50d ? ;-) (0 should come first)
aioobe
aioobe: You are correct, 0 should be first. I changed my answer.
Skip Head
You still have a leading 0 in your first line though.
aioobe
Fixed it. Thanks, aioobe.
Skip Head
+4  A: 

I would stop doing optimizations based on micro-benchmarks and go for something that looks elegant codewise, such as String.format("%0"+numDigits+"d", testNumber)

aioobe
That all makes sense, but it's WAY slower according to the micro-benchmarking. Is that stuff really useless, because I'm getting results that suggest the String.format is 100 times slower than substringing an int. Does it have no usefulness at all?
Martin
You should be careful when interpreting figures from micro-benchmarks. In some situations the compiler/JIT compiler optimize away large chunks of code if it realizes that it's unnecessary. It's quite obvious for instance, that (in the third case) `newNum` will stay the same in all iterations. This could in turn potentially cause `String.valueOf(newNum).substring(1);` to be put outside of the loop, which havocs the entire bench-mark.
aioobe
I created an int[] of length 10,000 and then inserted a random number between 0 and 10,000 into each position and re-ran the tests, and the timings were almost identical.I'm fascinated by, and far prefer the readability of the String.format means of creating the number, but it's difficult to ignore the timings:StringBuilder: 31,317,244 nanosecondsDecimalFormat: 36,368,634 nanosecondsSubstring: 2,623,557 nanosecondsString.format: 184,692,956 nanoseconds
Martin
Try adding the actual creation of the folder (of name `folderHolder`) to each iteration. Do you get the same percentual difference between the tests? Or is the computation of the folder-name negligible?
aioobe
No difference at all, from including the declaration of the String object inside the loop.
Martin
I meant the "mkdir" part. (I assume thats needed too, perhaps I'm mistaking?)
aioobe
@Martin: You say you are using this routine to generate a file or directory name. If you consider the time required for the disk access, how important do you really think it is to "optimize" this part of your code?
jarnbjo
Actually, it's going to be using GPFS via WebDAV and/or Amazon S3. I'm simply trying to optimise everything I can, while I'm building the thing. Plus, this has become an academic exercise for me as well now!
Martin
I believe the time it takes to execute a `String.format` call, is negligible in comparison. My suggestion: Go for the cleanest and most readable code.
aioobe
A: 

This probably related link discusses many of the ways to do it. I would recommend the Apache option, StringUtils, it may or may not be the absolute fastest, but its usually one of the easiest to understand, and has had the )&##@ pounded out of it, so it probably won't break in some unforeseen edge case. ;)

mezmo
+1  A: 

Here is a solution that is basically the same thing as your StringBuilder with two optimizations:

  1. It directly writes to an array bypassing the StringBuilder overhead
  2. It does the operations in reverse instead of insert(0), which requries an arraycopy each time

It also makes the assumptions that numDigits will be >= to the actual characters required, but will properly handle negative numbers:

before = System.nanoTime();
String arrString=null;
for ( int j = 0; j < numIterations; j++ ){
  char[] arrNum = new char[numDigits];
  int i = numDigits-1;
  boolean neg = testNumber<0;
  for(int tmp = neg?-testNumber:testNumber;tmp>0;tmp/=10){
    arrNum[i--] = (char)((tmp%10)+48);
  }
  while(i>=0){
    arrNum[i--]='0';
  }
  if(neg)arrNum[0]='-';
  arrString = new String(arrNum);
}
after = System.nanoTime();
System.out.println( "04: " + outputFormat.format( after - before ) + " nanoseconds" );
System.out.println( "Sanity check: Folder = \"" + arrString + "\"" );

This method well outperformed your samples on my machine for negatives and was comparable for positives:

01: 18,090,933 nanoseconds
Sanity check: Folder = "000000742"
02: 22,659,205 nanoseconds
Sanity check: Folder = "000000742"
03: 2,309,949 nanoseconds
Sanity check: Folder = "000000742"
04: 6,380,892 nanoseconds
Sanity check: Folder = "000000742"

01: 14,933,369 nanoseconds
Sanity check: Folder = "0000-2745"
02: 21,685,158 nanoseconds
Sanity check: Folder = "-000002745"
03: 3,213,270 nanoseconds
Sanity check: Folder = "99997255"
04: 1,255,660 nanoseconds
Sanity check: Folder = "-00002745"

Edit: I noticed your tests resued some of the objects within the iteration loop, which I had not done in mine (such as not recalculating baseNum in the substring version). When I altered the tests to be consistent (not resuing any objects / calculations my version performed better than yours:

01: 18,377,935 nanoseconds
Sanity check: Folder = "000000742"
02: 69,443,911 nanoseconds
Sanity check: Folder = "000000742"
03: 6,410,263 nanoseconds
Sanity check: Folder = "000000742"
04: 996,622 nanoseconds
Sanity check: Folder = "000000742"

Of course as others have mentioned micro benchmarking is incredibly difficult / "fudgy" with all of the optimization performed by the VM and the inability to control them.

M. Jessup