views:

545

answers:

2

I'm not sure whether this belongs on StackOverflow or in the Clojure Google group. But the group seems to be busy discussing numeric improvements for Clojure 1.2, so I'll try here:

http://shootout.alioth.debian.org/ has a number of performance benchmarks for various languages.

I noticed that Clojure was missing, so I made a Clojure version of the n-body problem.

The fastest code I was able to produce can be found here, and benchmarking it seems to be saying that for number crunching Clojure is

  • factor ~10 quicker than Python/Ruby/Perl
  • factor ~4 slower than C/Java/Scala/Ada
  • approximately on par with OCaml, Erlang and Go

I'm quite happy with that level of performance.

My question to the Clojure gurus is

  • Are there obvious improvements I have missed, either in terms of speed or in terms of code brevity or readability (without sacrificing speed)?
  • Do you consider this to be representative of Clojure performance vs Python/Ruby/Perl on one hand and Java/C on the other?

Update

More Clojure 1.1 benchmark programs for the shootout here, including the n-body problem.

+3  A: 

I wonder if Cantor might be of use to you -- it's a high performance math library for Clojure. Also see this thread on the Google group, which is about a similar project in the context of the new primitive arithmetic stuff.

Michał Marczyk
Cantor looks useful, I'll have a look. Thanks!
j-g-faustus
+4  A: 

Not a flood of responses here :) but apparently some interest, so I'll try to answer my own question with what I've learned over the past few days:

  • With the 1.1 optimization approach (Java primitives and mutable arrays) ~4x slower than optimized Java is about as fast as it goes.
  • The 1.2 constructs definterface and deftype are more than twice as fast, coming within ~1.7x (+70%) of Java with shorter, simpler and cleaner code than for 1.1.

Here are the implementations:

More details including "lessons learned", JVM version and profiling screenshots.

Subjectively speaking, optimizing the 1.2 code was a breeze compared to optimizing 1.1, so this is very good news for Clojure number crunching. (Actually close to amazing :)

The 1.2 testing used the current master branch, I did not try any of the new numeric branches. From what I can gather the new ideas currently being discussed

  • may make non-optimized numerics faster
  • may speed up the 1.1 version of this benchmark
  • will probably not speed up the 1.2 version, it is already as "close to the metal" as it is likely to get.

Disclaimers:

  • Clojure 1.2 is not released yet, so 1.2 benchmark results are preliminary.
  • This is one particular benchmark on physics calculations. It is relevant to floating point number crunching, but irrelevant to performance in areas like string parsing, concurrency or web request handling.
j-g-faustus
You should try the numeric branches, looking over your code I see places where you will get unnecessarily get boxing under the 1.2 master branch.
dnolen
Thanks for the tip. The equiv branch had some impact on the 1.1 version, no noticable difference for the 1.2 version. There may be alternative implementations where the difference is more significant. The equiv branch still looks good, especially primitive type hints and :static.
j-g-faustus
Thank your very much for your highly interesting contribution. I especially enjoyed reading the detailed analysis. I compiled the 1.2 version and compared it with the Java example in several runs and it was on average only 1.5x slower.Two questions have been evoked on my part. As you stated in your analysis the code is not idiomatic because it makes use of mutable variables. How much slower would the code perform if immutable variables were used? How much effort would be involved in parallelizing the mutable or the immutable version?
Stefan Schmidt
Thanks, glad you liked it. 1.5x: Sounds good, we're in the ballpark of each other. Mutable vs. immutable: I haven't tried immutable, feel free to give it a go. I would expect somewhere between "mutable 1.2" and "mutable 1.1", but it's hard to guess. Parallelizing: This particular problem is hard to parallelize in any case. In general, multithreading with mutable values is a lot harder than with immutable values, which is part of the reason Clojure defaults to immutable.
j-g-faustus
BTW: Here is a Clojure implementation of a much larger benchmark involving concurrency and log file parsing: http://meshy.org/2009/12/13/widefinder-2-with-clojure.html Well worth a read if you are into large-scale performance.
j-g-faustus
Thank you for the notice, I have that on my reading list :)
Stefan Schmidt
Do you know if there is any special reason why the Clojure result is not listed in the Widefinder 2 benchmark table?http://wikis.sun.com/display/WideFinder/Results
Stefan Schmidt
I was wondering the same. Tim Brady's comments seem to suggest that he would like it refactored to be more general http://www.tbray.org/ongoing/When/200x/2009/12/15/Osborne-WF2-Clojure I can only guess that the author felt he had spent enough time on it and didn't bother with the refactoring. But no, I don't know the reason.
j-g-faustus