views:

490

answers:

4

I have a very simple unit test that just allocates a lot of Strings:

public class AllocationSpeedTest extends TestCase {

    public void testAllocation() throws Exception {

     for (int i = 0; i < 1000; i++) {
      long startTime = System.currentTimeMillis();
      String a = "dummy";
      for (int j = 0; j < 1000; j++) {
       a += "allocation driven";
      }
      System.out.println(i + ": " + (System.currentTimeMillis() - startTime) + "ms " + a.length());
     }

    }

}

On my Windows PC (Intel Core Duo, 2.2GHz, 2GB) this prints on average:

...
71: 47ms 17005
72: 47ms 17005
73: 46ms 17005
74: 47ms 17005
75: 47ms 17005
76: 47ms 17005
77: 47ms 17005
78: 47ms 17005
79: 47ms 17005
80: 62ms 17005
81: 47ms 17005
...

On SunOS (5.10 Generic_138888-03 sun4v sparc SUNW, SPARC-Enterprise-T5120):

...
786: 227ms 17005
787: 294ms 17005
788: 300ms 17005
789: 224ms 17005
790: 260ms 17005
791: 242ms 17005
792: 263ms 17005
793: 287ms 17005
794: 219ms 17005
795: 279ms 17005
796: 278ms 17005
797: 231ms 17005
798: 291ms 17005
799: 246ms 17005
800: 327ms 17005
...

JDK version is 1.4.2_18 on both machines. JVM parameters are the same and are:

–server –Xmx256m –Xms256m

Can anyone explain why SUN super server is slower?

(http://www.sun.com/servers/coolthreads/t5120/performance.xml)

A: 

The SunOS hardware is slower, and the vm may be somewhat slower as well.

jsight
Is there anyway to make it faster?
Superfilin
SUN says: "The Sun SPARC Enterprise T5120 server with Chip Multithreading (CMT) technology delivers breakthrough levels of performance with dramatic power and space savings, as demonstrated by a range of World Record benchmark results." It's not that I really believe that marketing bull shit, but why is that slower?
Superfilin
@Superfilin: pedal harder? :-)
Stephen C
That's probably what I need :)
Superfilin
This chip has at least 4 cores, each capable of 8 threads. Perhaps adding concurrency is the key to greater perf here? I don't think the single threaded performance is going to impress.
jsight
+1  A: 

It's my understanding that UltraSPARC T2-based machines are aimed at performance-per-watt rather than raw performance. You might try dividing the allocation time by the power consumption and see what kind of numbers you get. :)

Is there a reason you're running 1.4.2 instead of 1.6?

David Moles
Customer requirement :)
Superfilin
Do you have any offical links that may prove this ;)?
Superfilin
...aimed at "dollar-per-powerpoint presentation". They consume nearly as much power as intel processors. May be 10-20 watts less, but not nearly enough to compensate for performance loss.
ima
"Confirm" I don't know about; most of what I've seen is marketingware. The T2 is an eight-core chip, isn't it? You might try seeing what happens with multiple threads. I just did a quick test on my desktop (Core 2 Quad, 2.4GHz), with just 250 strings, and it looks like two threads can each allocate 250 in about 1.8x the time one thread can allocate 250, but by 4 threads it's taking more than 4x the time. I'd be curious how it scales on the T2.
David Moles
I have submitted a separate answer describing a multi-threaded test on SPARC.
Superfilin
The T2 we test on is 4-core chip, but in production it will be 8-core.
Superfilin
A: 

I don't think that this is measuring memory allocation. For a start, there is an awful lot of character copying going on in a += "allocation driven";. But I suspect that the real bottleneck is in getting the output from System.out.println(...) through the network layers from the app on the Sun server to your remote workstation.

As an experiment, try multiplying the inner loop count by 10 and 100, and see if that "speeds up" the Sun server relative to your workstation.

Another thing you could try is to move the inner loop into a separate procedure. It is possible that since you are doing all the work in one invocation of main, the JIT compiler never gets a chance to compile it.

(Artificial "micro-benchmarks" like this are always susceptible to effects like these. I tend to distrust them.)

Stephen C
The expression that prints the time is calculates before System.out is called. So, it's not a bottle neck.
Superfilin
Ummm ... that doesn't necessarily follow. Anyway, try the experiment. If you are right, multiplying the inner loop count by N will multiply the times by N.
Stephen C
Multiplying inner loop by 10 does not increase the number by 10 as the amount of wasted memory becomes larger and larger with every contcatenation. It basically allocates a new char[a.length + "allocation driven".length] and throws the old array out. So, on each iteration it will waste 17*i chars. That sums to 17/2*(n^2 - n) wasted memory amount at least. So, increasing the inner loop (n) 10 times will increase the time 100 times. And trust me System.out is not in the game here :).
Superfilin
The test just tries to mimic heavy memeory allocation and sees it performance. I just wanted to have an explanation why is it so much slower on Sun.
Superfilin
OK I misread the benchmark (it was late). A better test of my theory would be add a middle loop so that you do the 1000 concatenation times N times in the instrumentation loop.
Stephen C
This micro-benchmark was created as a response to slower performance comparing SPARC server and developer workstations. So, it does show that SPARC is slower in single/little threaded environments. See my answer to this question.
Superfilin
In general, micro-bechmarks do not answer all questions, they tend to answer one or two. You just need to interpret results in the right way.
Superfilin
+2  A: 

The CPU is indeed slower on SPARC (1.2Ghz) and as answered by one of the Sun's engineers T2 is usualy 3 times slower for single-threaded application than modern Intel processors. Though, he also stated that in a multi-threaded environment SPARC should be faster.

I have made a multi-threaded test using GroboUtils library and tested both allocations (through concatenations) and simple calculations ( a += j*j ) to test processor. And I've got the following results:

1 thread : Intel : Calculations test : 43ms
100 threads : Intel : Calculations test : 225ms

1 thread : Intel : Allocations test : 35ms
100 threads : Intel : Allocations test : 1754ms

1 thread : SPARC : Calculations test : 197ms
100 threads : SPARC : Calculations test : 261ms

1 thread : SPARC : Allocations test : 236ms
100 threads : SPARC : Allocations test : 1517ms

SPARC shows its power here by outperforming Intel on 100 threads.

Here goes the multi-threaded calculation test:

import java.util.ArrayList;
import java.util.List;

import net.sourceforge.groboutils.junit.v1.MultiThreadedTestRunner;
import net.sourceforge.groboutils.junit.v1.TestRunnable;
import junit.framework.TestCase;

public class TM1_CalculationSpeedTest extends TestCase {

    public void testCalculation() throws Throwable {

     List threads = new ArrayList();
     for (int i = 0; i < 100; i++) {
      threads.add(new Requester());
     }
     MultiThreadedTestRunner mttr = new MultiThreadedTestRunner((TestRunnable[]) threads.toArray(new TestRunnable[threads.size()]));
     mttr.runTestRunnables(2 * 60 * 1000);

    }

    public class Requester extends TestRunnable {

     public void runTest() throws Exception {
      long startTime = System.currentTimeMillis();
      long a = 0;
      for (int j = 0; j < 10000000; j++) {
       a += j * j;
      }
      long endTime = System.currentTimeMillis();
      System.out.println(this + ": " + (endTime - startTime) + "ms " + a);
     }

    }

}

Here goes the multi-threaded allocation test:

import java.util.ArrayList;
import java.util.List;

import junit.framework.TestCase;
import net.sourceforge.groboutils.junit.v1.MultiThreadedTestRunner;
import net.sourceforge.groboutils.junit.v1.TestRunnable;

public class TM2_AllocationSpeedTest extends TestCase {

    public void testAllocation() throws Throwable {

     List threads = new ArrayList();
     for (int i = 0; i < 100; i++) {
      threads.add(new Requester()); 
     }
     MultiThreadedTestRunner mttr = new MultiThreadedTestRunner((TestRunnable[]) threads.toArray(new TestRunnable[threads.size()]));
     mttr.runTestRunnables(2 * 60 * 1000);

    }

    public class Requester extends TestRunnable {

     public void runTest() throws Exception {
      long startTime = System.currentTimeMillis();
      String a = "dummy";
      for (int j = 0; j < 1000; j++) {
       a += "allocation driven";
      }
      long endTime = System.currentTimeMillis();
      System.out.println(this + ": " + (endTime - startTime) + "ms " + a.length());
     }

    }

}
Superfilin
Is the last output a typo? Should that be SPARC again?
David Moles
Yes, you are right :). Fixed that.
Superfilin