ansaurus

Question

Java memory allocation performance (SunOS vs Windows)

Answer 1

A:

The SunOS hardware is slower, and the vm may be somewhat slower as well.

jsight 2009-08-28 12:04:24

Is there anyway to make it faster?

Superfilin 2009-08-28 12:09:24

SUN says: "The Sun SPARC Enterprise T5120 server with Chip Multithreading (CMT) technology delivers breakthrough levels of performance with dramatic power and space savings, as demonstrated by a range of World Record benchmark results." It's not that I really believe that marketing bull shit, but why is that slower?

Superfilin 2009-08-28 12:17:17

@Superfilin: pedal harder? :-)

Stephen C 2009-08-28 13:54:03

That's probably what I need :)

Superfilin 2009-08-28 14:37:35

This chip has at least 4 cores, each capable of 8 threads. Perhaps adding concurrency is the key to greater perf here? I don't think the single threaded performance is going to impress.

jsight 2009-09-01 14:14:42

Answer 2

+1 A:

It's my understanding that UltraSPARC T2-based machines are aimed at performance-per-watt rather than raw performance. You might try dividing the allocation time by the power consumption and see what kind of numbers you get. :)

Is there a reason you're running 1.4.2 instead of 1.6?

David Moles 2009-08-28 12:18:39

Customer requirement :)

Superfilin 2009-08-28 12:20:26

Do you have any offical links that may prove this ;)?

Superfilin 2009-08-28 12:23:29

...aimed at "dollar-per-powerpoint presentation". They consume nearly as much power as intel processors. May be 10-20 watts less, but not nearly enough to compensate for performance loss.

ima 2009-08-28 14:18:05

"Confirm" I don't know about; most of what I've seen is marketingware. The T2 is an eight-core chip, isn't it? You might try seeing what happens with multiple threads. I just did a quick test on my desktop (Core 2 Quad, 2.4GHz), with just 250 strings, and it looks like two threads can each allocate 250 in about 1.8x the time one thread can allocate 250, but by 4 threads it's taking more than 4x the time. I'd be curious how it scales on the T2.

David Moles 2009-08-28 14:58:03

I have submitted a separate answer describing a multi-threaded test on SPARC.

Superfilin 2009-09-03 07:49:18

The T2 we test on is 4-core chip, but in production it will be 8-core.

Superfilin 2009-09-03 07:50:02

Answer 3

A:

I don't think that this is measuring memory allocation. For a start, there is an awful lot of character copying going on in a += "allocation driven";. But I suspect that the real bottleneck is in getting the output from System.out.println(...) through the network layers from the app on the Sun server to your remote workstation.

As an experiment, try multiplying the inner loop count by 10 and 100, and see if that "speeds up" the Sun server relative to your workstation.

Another thing you could try is to move the inner loop into a separate procedure. It is possible that since you are doing all the work in one invocation of main, the JIT compiler never gets a chance to compile it.

(Artificial "micro-benchmarks" like this are always susceptible to effects like these. I tend to distrust them.)

Stephen C 2009-08-28 13:51:47

The expression that prints the time is calculates before System.out is called. So, it's not a bottle neck.

Superfilin 2009-08-28 13:54:24

Ummm ... that doesn't necessarily follow. Anyway, try the experiment. If you are right, multiplying the inner loop count by N will multiply the times by N.

Stephen C 2009-08-28 14:04:17

Multiplying inner loop by 10 does not increase the number by 10 as the amount of wasted memory becomes larger and larger with every contcatenation. It basically allocates a new char[a.length + "allocation driven".length] and throws the old array out. So, on each iteration it will waste 17*i chars. That sums to 17/2*(n^2 - n) wasted memory amount at least. So, increasing the inner loop (n) 10 times will increase the time 100 times. And trust me System.out is not in the game here :).

Superfilin 2009-08-28 14:31:02

The test just tries to mimic heavy memeory allocation and sees it performance. I just wanted to have an explanation why is it so much slower on Sun.

Superfilin 2009-08-28 14:32:11

OK I misread the benchmark (it was late). A better test of my theory would be add a middle loop so that you do the 1000 concatenation times N times in the instrumentation loop.

Stephen C 2009-08-29 00:27:55

This micro-benchmark was created as a response to slower performance comparing SPARC server and developer workstations. So, it does show that SPARC is slower in single/little threaded environments. See my answer to this question.

Superfilin 2009-09-03 07:55:46

In general, micro-bechmarks do not answer all questions, they tend to answer one or two. You just need to interpret results in the right way.

Superfilin 2009-09-03 07:56:44

Answer 4

+2 A:

The CPU is indeed slower on SPARC (1.2Ghz) and as answered by one of the Sun's engineers T2 is usualy 3 times slower for single-threaded application than modern Intel processors. Though, he also stated that in a multi-threaded environment SPARC should be faster.

I have made a multi-threaded test using GroboUtils library and tested both allocations (through concatenations) and simple calculations ( a += j*j ) to test processor. And I've got the following results:

1 thread : Intel : Calculations test : 43ms
100 threads : Intel : Calculations test : 225ms

1 thread : Intel : Allocations test : 35ms
100 threads : Intel : Allocations test : 1754ms

1 thread : SPARC : Calculations test : 197ms
100 threads : SPARC : Calculations test : 261ms

1 thread : SPARC : Allocations test : 236ms
100 threads : SPARC : Allocations test : 1517ms

SPARC shows its power here by outperforming Intel on 100 threads.

Here goes the multi-threaded calculation test:

import java.util.ArrayList;
import java.util.List;

import net.sourceforge.groboutils.junit.v1.MultiThreadedTestRunner;
import net.sourceforge.groboutils.junit.v1.TestRunnable;
import junit.framework.TestCase;

public class TM1_CalculationSpeedTest extends TestCase {

    public void testCalculation() throws Throwable {

     List threads = new ArrayList();
     for (int i = 0; i < 100; i++) {
      threads.add(new Requester());
     }
     MultiThreadedTestRunner mttr = new MultiThreadedTestRunner((TestRunnable[]) threads.toArray(new TestRunnable[threads.size()]));
     mttr.runTestRunnables(2 * 60 * 1000);

    }

    public class Requester extends TestRunnable {

     public void runTest() throws Exception {
      long startTime = System.currentTimeMillis();
      long a = 0;
      for (int j = 0; j < 10000000; j++) {
       a += j * j;
      }
      long endTime = System.currentTimeMillis();
      System.out.println(this + ": " + (endTime - startTime) + "ms " + a);
     }

    }

}

Here goes the multi-threaded allocation test:

import java.util.ArrayList;
import java.util.List;

import junit.framework.TestCase;
import net.sourceforge.groboutils.junit.v1.MultiThreadedTestRunner;
import net.sourceforge.groboutils.junit.v1.TestRunnable;

public class TM2_AllocationSpeedTest extends TestCase {

    public void testAllocation() throws Throwable {

     List threads = new ArrayList();
     for (int i = 0; i < 100; i++) {
      threads.add(new Requester()); 
     }
     MultiThreadedTestRunner mttr = new MultiThreadedTestRunner((TestRunnable[]) threads.toArray(new TestRunnable[threads.size()]));
     mttr.runTestRunnables(2 * 60 * 1000);

    }

    public class Requester extends TestRunnable {

     public void runTest() throws Exception {
      long startTime = System.currentTimeMillis();
      String a = "dummy";
      for (int j = 0; j < 1000; j++) {
       a += "allocation driven";
      }
      long endTime = System.currentTimeMillis();
      System.out.println(this + ": " + (endTime - startTime) + "ms " + a.length());
     }

    }

}

Superfilin 2009-09-03 07:46:04

Is the last output a typo? Should that be SPARC again?

David Moles 2009-09-03 13:46:38

Yes, you are right :). Fixed that.

Superfilin 2009-09-03 15:54:28

ansaurus

tags:

views:

answers:

Java memory allocation performance (SunOS vs Windows)

related questions