views:

541

answers:

8

hello again

I am curious about performance of Java numerical algorithms, say for example matrix matrix double precision multiplication, using the latest JIT machines as compared for example to hand tuned SSE C++/assembler or Fortran counterparts.

I have looked on the web but most of the results come from almost 10 years ago and I understand Java progressed quite a lot since then.

If you have experience using Java for numerically intensive applications can you share your experience. Also how well does Java perform in kernels where the loops are relatively short and the memory access is not very uniform but still within the limits of L1 cache? If such kernel is executed multiple times in succession, can JVM optimize it during runtime?

Thanks

A: 

Java uses a Just in Time (JIT) compiler to convert the bytecode to native machine language - so the first time it runs through a code block it will be slower but once the segment is 'warmed up' the performance will be equivalent. In short - the numerical performance is pretty good.

Shane C. Mason
JIT's are nice, but are not enough to guarantee good numerical performance.
Thorbjørn Ravn Andersen
A: 

Seconding that your best bet is to test it for yourself, as performance will vary somewhat depending on what you're doing exactly. I find it difficult to believe Shane C. Mason's answer that Java performance will be the same as C++ or Fortran performance, as even C++ and Fortran are not really comparable for some scientific computing algorithms.

I have a computational fluid dynamics code that I wrote using C++ and the same code essentially translated into Fortran. I'm not really sure why yet, but the Fortran version is about twice as fast as the C++ version. I would guess that with features like bounds-checking and garbage collection, Java would be slower than both, but I would not know until I tested.

notJim
have you used restrict keyword in your C++ code?Fortran compilers do not have to guarantee that the memory pointers are not aliased while C++ compilers do have to assume that memory is aliased unless told otherwise.what compilers did you use? I programmed my program in C++ with intrinsics and Intel compiler is significantly faster than GCC, I guess Intel C++ orders instructions better since assembly was otherwise very similar except for ordering.
aaa
I'm vaguely aware of aliasing issues, but I don't understand it well enough, yet. I have not tried restrict yet, I haven't had the time to spend on this, unfortunately. I was using icpc and ifort (both Intel compilers) on linux with -O3.Note that my point isn't that C++ performance can't match fortran, but rather that you need to compare implementations in addition to languages.
notJim
Fortran also has a much more relaxed numerics model than C++ -- it is allowed to do a lot of skanky math optimizations by default that you only get in C/C++ with -ffast-math and similar. Sometimes this doesn't matter, and sometimes it'll make your results less accurate.
Stephen Canon
+1  A: 

This is a link to the programming language shootout page for java vs c++, which will give you a comparison of java's speed on several compute intensive algorithms. It will also show you what highest performance java code looks like. For the most part, for these few specific benchmarks, java took more time (but not more than 2 or 3 times) to run.

Peter Recore
I could not immediately tell - does this comparison disregard warmup times. Java still needs a _lot_ of initial work before reaching cruise speed.
Thorbjørn Ravn Andersen
true. if you want to write a program that starts up, does a few calculations and then shuts down, you probably don't want java. but if your program will be running for a few minutes, then the startup time is just noise. One alternative of course is to start up a java process and have it act as a calculation server - every time you need a calculation, you just call into an already running instance.
Peter Recore
igouy
@igouy if you compare the steady state to the java -server results, they aren't that much different.
Peter Recore
@Peter - I know, tell Thorbjørn :-) [But also check the measurements shown in the FAQ]
igouy
+1  A: 

This is coming from a .NET side of things, but I'm 90% sure that it's the case for Java too. While the JIT will make some use of SSE instructions where it can, it currently does not auto-vectorize your code when dealing with, for example, matrix multiplications. Hand vectorized C++ using compiler intrinsics/inline assembly will definitely be faster here.

JulianR
A: 

This can be so dependent on what you are doing in the C++ code.

For example, are you using the GPU? Edit I forgot about jogl, so Java can compete here.

Are you parallelized using STM or shared-memory, then Java can't compete. For a link on analysis of parallel matrix multiplication: http://www.cs.utexas.edu/users/plapack/papers/ipps98/ipps98.html

Do you have enough memory to do the calculations in memory, so the garbage collector won't be needed, and have you fine-tuned the garbage collector for optimal performance? Then, Java can be competitive, perhaps.

Are you using multicores, and is the C++ optimized to take advantage of this architecture? Then Java won't be able to compete.

Are you using several computers tied together, then Java won't be able to compete.

Are you using any combination of these, then it will depend on the particular implementation.

Java is not designed to compete with a hand-tuned C++ program, but, the time it takes to do the tuning, are you doing enough calculations where it will matter? Java will be able to give some reasonable speed but with less work than hand-tuning, but not much of an improvement over just doing C++ code.

You may want to see if there is an improvement over Haskell or Erlang, for example, over your C++, as these languages are better designed for this type of work.

James Black
Using GPU? As in, using OpenGL? If using JOGL Java can compete nicely.
Thorbjørn Ravn Andersen
You are correct, I have corrected my answer, I forgot that you can use jogl for the GPU work.
James Black
+1  A: 

One of the weakest points in java is (native) matrix operations. This is due to the nature of Java matrices:

  • You can not declare a matrix to be rectangular, ie. each row can have a different number of columns.

  • A matrix is technically not a "matrix of doubles (or ints, ...)", but an array of arrays of ... . The big difference is that since arrays are Java objects you can assign the same array object to more than 1 row.

These two properties make a lot of standard matrix optimizations impossible for the compiler.

You might get better performance by using a Java library which emulates matrices on a single long array. However you have the overhead of method calls for all access.

Carsten
I think you mean you can't declare a 2D array to be rectangular. But you seem to be arguing that the most literal and straightforward implementation of a matrix in Java has some issues. Why would this be the only possible implementation? If it isn't then there's not much of a statement possible here about the 'nature of Java matrices'. What about Java matrix libraries like Colt?
Sean Owen
Matrices are not always represented that way. See `java.awt.image.Kernel` for an example of a matrix represented by a 1D array
finnw
The problem with such libraries is that all matrix access is done through methods. Method calls are slower than array access in general and they prevent certain compiler optimizations. Eg.`for(int i=0; i<m; i++) { x = a[i]; ...}`A clever compiler can add an if statement at the beginning to check i m is smaller or equal to the length of a[] and if that's true compeltely eliminate all boundary checks in the for loop (if it can also ascert that m does not change). This also works for nested for loops which are very commons in matrix operations and thus can save a lot of checks.
Carsten
Another problem with using one long array instead of a multidimensional one is that it is easier to run code in parallel. If I know 2 parts of the execution access different rows or columns in the array (again eg nested loops) I know they can't get in each other's way. It is much harder for a compiler to know this about operations on one long array.
Carsten
A: 

C++ will definitely be faster. You can even have some hand-optimized libraries for your purposes that contain assembly codes for each of the major CPUs out there. You can't get better than that.

Afterwards, you can use JNI to call to it from Java, if needed.

Java is not meant for high performance arithmetic calculations like this. If you are depending on these, I'd recommend picking a proper, low-level language to implement that. Or, alternatively, you can write the performance-specific part in a low level language, and then connect it to a Java front-end using JNI or some other IPC method.

Agoston Horvath
A: 

Are these kind of computations you're interested in - Fast Fourier Transform, Jacobi Successive Over Relaxation, Monte Carlo Integration, Sparse Matrix Mult, Dense LU Matrix Factorisation?

They make up the SciMark 2.0 composite benchmark which you can launch as an applet on your machine.

There are also ANSI C versions of the programs, and an Intel document (pdf) on optimizing and recompiling SciMark for C++.


Similarly you could use The Java Grande Forum Benchmark Suite and the comparison C programs.

igouy