Just stumbled into this posting and thought I would throw my two cents in. I am author of EJML and I am also working on a performance and stability benchmark for java libraries. While several issues go into determining how fast an algorithm is, Mikhail is correct that caching is a very important issue in performance of large matrices. For smaller matrices the libraries overhead becomes more important.
Due to overhead in array access, pure Java libraries are slower than highly optimized c libraries, even if the algorithms are exactly the same. Some libraries get around this issue by making calls to native code. You might want to check out
http://code.google.com/p/matrix-toolkits-java/
which does exactly that. There will be some overhead in copying memory from java to the native library, but for large matrices this is insignificant.
For a benchmark on pure java performance (the one that I'm working on) check out:
http://code.google.com/p/java-matrix-benchmark/
Another benchmark is here:
http://www.ujmp.org/java-matrix/benchmark/
Either of these benchmarks should give you a good idea of performance for large matrices.