Looks like clojure will have a fork-join implementation which looks like a functional wrapper over java's fork join framework.
I am wondering what the difference between these and pmap/preduce could be ?
Looks like clojure will have a fork-join implementation which looks like a functional wrapper over java's fork join framework.
I am wondering what the difference between these and pmap/preduce could be ?
From looking at that code, their functionality will be mostly the same - the only difference is that pmap
uses Futures running on the Agent threadpool as it's underlying primitive, while pvmap
uses fork-join.
I'm not in a position to say for sure, but I'd expect that whichever one performs better in the general case would become the standard implementation for pmap
, unless there are significant enough tradeoffs to make having both worthwhile.
It also looks like (for now at least) the fork-join framework only supports vectors, so it's not semi-lazy like pmap
.
Fork-join is more general than the sequence-based pmap
/preduce
, and should allow for more fine-grained control over parallelism. The exact APIs for doing this are still up in the air.
One difference, as far as I understand it, is that pmap
will run only at whatever degree of "chunkiness" it is given. The function is mapped over each member of the sequence given to pmap
. If a the granularity is too small the potential benefits of parallelism get swallowed in the overhead of creating and managing too many Future
s.
Fork-join enables work stealing so that how much gets run on each thread can be adaptive.
Neither pmap or pvmap will save us from having to use the correct chunk size. For my projects that usually means breaking the data into chunks and using map on each chunk, then using pmap to map the chunks in parallel. then reduce and flatten.
These slides contain some charts showing comparisons between the two approaches: http://data-sorcery.org/2010/10/23/clojureconj/