Clojure number crunching performance

I'm not sure whether this belongs on StackOverflow or in the Clojure Google group. But the group seems to be busy discussing numeric improvements for Clojure 1.2, so I'll try here:

http://shootout.alioth.debian.org/ has a number of performance benchmarks for various languages.

I noticed that Clojure was missing, so I made a Clojure version of the n-body problem.

The fastest code I was able to produce can be found here, and benchmarking it seems to be saying that for number crunching Clojure is

factor ~10 quicker than Python/Ruby/Perl
factor ~4 slower than C/Java/Scala/Ada
approximately on par with OCaml, Erlang and Go

I'm quite happy with that level of performance.

My question to the Clojure gurus is

Are there obvious improvements I have missed, either in terms of speed or in terms of code brevity or readability (without sacrificing speed)?
Do you consider this to be representative of Clojure performance vs Python/Ruby/Perl on one hand and Java/C on the other?

Update

More Clojure 1.1 benchmark programs for the shootout here, including the n-body problem.

Not a flood of responses here :) but apparently some interest, so I'll try to answer my own question with what I've learned over the past few days:

With the 1.1 optimization approach (Java primitives and mutable arrays) ~4x slower than optimized Java is about as fast as it goes.
The 1.2 constructs definterface and deftype are more than twice as fast, coming within ~1.7x (+70%) of Java with shorter, simpler and cleaner code than for 1.1.

Here are the implementations:

More details including "lessons learned", JVM version and profiling screenshots.

Subjectively speaking, optimizing the 1.2 code was a breeze compared to optimizing 1.1, so this is very good news for Clojure number crunching. (Actually close to amazing :)

The 1.2 testing used the current master branch, I did not try any of the new numeric branches. From what I can gather the new ideas currently being discussed

may make non-optimized numerics faster
may speed up the 1.1 version of this benchmark
will probably not speed up the 1.2 version, it is already as "close to the metal" as it is likely to get.

Disclaimers:

Clojure 1.2 is not released yet, so 1.2 benchmark results are preliminary.
This is one particular benchmark on physics calculations. It is relevant to floating point number crunching, but irrelevant to performance in areas like string parsing, concurrency or web request handling.

You should try the numeric branches, looking over your code I see places where you will get unnecessarily get boxing under the 1.2 master branch.

dnolen 2010-06-29 21:33:18

Thanks for the tip. The equiv branch had some impact on the 1.1 version, no noticable difference for the 1.2 version. There may be alternative implementations where the difference is more significant. The equiv branch still looks good, especially primitive type hints and :static.

j-g-faustus 2010-07-02 13:39:24

Thank your very much for your highly interesting contribution. I especially enjoyed reading the detailed analysis. I compiled the 1.2 version and compared it with the Java example in several runs and it was on average only 1.5x slower.Two questions have been evoked on my part. As you stated in your analysis the code is not idiomatic because it makes use of mutable variables. How much slower would the code perform if immutable variables were used? How much effort would be involved in parallelizing the mutable or the immutable version?

Stefan Schmidt 2010-07-05 10:18:11

Thanks, glad you liked it. 1.5x: Sounds good, we're in the ballpark of each other. Mutable vs. immutable: I haven't tried immutable, feel free to give it a go. I would expect somewhere between "mutable 1.2" and "mutable 1.1", but it's hard to guess. Parallelizing: This particular problem is hard to parallelize in any case. In general, multithreading with mutable values is a lot harder than with immutable values, which is part of the reason Clojure defaults to immutable.

j-g-faustus 2010-07-05 11:00:09

BTW: Here is a Clojure implementation of a much larger benchmark involving concurrency and log file parsing: http://meshy.org/2009/12/13/widefinder-2-with-clojure.html Well worth a read if you are into large-scale performance.

j-g-faustus 2010-07-05 17:46:28

Thank you for the notice, I have that on my reading list :)

Stefan Schmidt 2010-07-06 12:22:01

Do you know if there is any special reason why the Clojure result is not listed in the Widefinder 2 benchmark table?http://wikis.sun.com/display/WideFinder/Results

Stefan Schmidt 2010-07-08 17:31:26

I was wondering the same. Tim Brady's comments seem to suggest that he would like it refactored to be more general http://www.tbray.org/ongoing/When/200x/2009/12/15/Osborne-WF2-Clojure I can only guess that the author felt he had spent enough time on it and didn't bother with the refactoring. But no, I don't know the reason.

j-g-faustus 2010-07-08 21:20:18

ansaurus

tags:

views:

answers:

Clojure number crunching performance

related questions