views:

178

answers:

3

Hello! What are the analogues of QtConcurrent for Scala (or Java)? Ie simplified implementation of MapReduce, the parallel map and foldl. Thank you

+4  A: 

See Scala Parallel Collections video and the paper "A Generic Parallel Collection Framework"

This states: Parallel collections are in the current development builds and will be released as part of Scala 2.9. See the release plan here.

oluies
+1  A: 

You can go a long way just using scala.actors.Futures and normal map/flatMap over collections. No easily parallelizable fold, however.

If you go multi-hosts, I'd use Akka's send-and-receive-future.

Daniel
+3  A: 

You can use Scala Parallel Collections. They are currently a part of Scala nightly releases, and will be released in Scala 2.9. The idea is that most operations available in regular collections are parallelized, so that parallel collections can be used in the same way.

Currently, there are a few collection types available - parallel ranges, parallel arrays and parallel hash tries. For instance, you can invoke a parallel map and fold operations on a parallel array like this:

scala> val pa = (0 until 10000).toArray.par
pa: scala.collection.parallel.mutable.ParArray[Int] = ParArray(0, 1, 2, 3, 4, 5, 6,...

scala> pa.map(_ + 1)
res0: scala.collection.parallel.mutable.ParArray[Int] = ParArray(1, 2, 3, 4, 5, 6, 7,...

scala> pa map { v => if (v % 2 == 0) v else -v }
res1: scala.collection.parallel.mutable.ParArray[Int] = ParArray(0, -1, 2, -3, 4, -5,...

scala> pa.fold(0) { _ + _ }
res2: Int = 49995000

There are other parallel collection operations available as well. Note that fold must take an associative operator - in the example above, addition is associative ((A + B) + C == A + (B + C)), i.e. you can add subsequences of numbers in any order and you will always obtain the same sum (reduce has a similar contract).

One other thing to be aware of is that the closures passed to parallel collections are invoked simultaneously. If they have side-effects, such as modifying a local variable in the environment, these accesses have to be synchronized. For instance, you could do something like this:

scala> var a = 0                                                                                                                                                                 
a: Int = 0                                                                                                                                                                       

scala> pa foreach { a += _ }                                                                                                                                                     

scala> a                                                                                                                                                                         
res1: Int = 49995000             

scala> a = 0
a: Int = 0

scala> pa foreach { a += _ }

scala> a
res7: Int = 49990086

and have different results every time, because the foreach invokes { a += _ } in parallel. In the example above, a should be made synchronized, protected with a lock or atomic.

But the idea is to use built-in combinators to accomplish a task and lean towards functional programming, avoiding local side-effects as in the example above.

You might want to read a little bit more about their internal mechanisms in the links provided in the other answer.

axel22