views:

152

answers:

5

Passing messages around with actors is great. But I would like to have even easier code.

Examples (Pseudo-code)

val splicedList:List[List[Int]]=biglist.partition(100)
val sum:Int=ActorPool.numberOfActors(5).getAllResults(splicedList,foldLeft(_+_))

where spliceIntoParts turns one big list into 100 small lists the numberofactors part, creates a pool which uses 5 actors and receives new jobs after a job is finished and getallresults uses a method on a list. all this done with messages passing in the background. where maybe getFirstResult, calculates the first result, and stops all other threads (like cracking a password)

+3  A: 

You can use Scalaz's concurrency features to achieve what you want.

import scalaz._
import Scalaz._
import concurrent.strategy.Executor
import java.util.concurrent.Executors

implicit val s = Executor.strategy[Unit](Executors.newFixedThreadPool(5))

val splicedList = biglist.grouped(100).toList
val sum = splicedList.parMap(_.sum).map(_.sum).get

It would be pretty easy to make this prettier (i.e. write a function mapReduce that does the splitting and folding all in one). Also, parMap over a List is unnecessarily strict. You will want to start folding before the whole list is ready. More like:

val splicedList = biglist.grouped(100).toList
val sum = splicedList.map(promise(_.sum)).toStream.traverse(_.sum).get
Apocalisp
I find scalaz too poorly documented for production use, I hope it gets better in a few months
TiansHUo
We're working on the documentation. The code is really straightforward though, so grokking the source is the next best thing to rolling your own.
Apocalisp
+1  A: 

At Scala Days 2010, there was a very interesting talk by Aleksandar Prokopec (who is working on Scala at EPFL) about Parallel Collections. This will probably be in 2.8.1, but you may have to wait a little longer. I'll lsee if I can get the presentation itself. to link here.

The idea is to have a collections framework which parallelizes the processing of the collections by doing exactly as you suggest, but transparently to the user. All you theoretically have to do is change the import from scala.collections to scala.parallel.collections. You obviously still have to do the work to see if what you're doing can actually be parallelized.

MatthieuF
To turn any collection into a parallel version of it, all you need is a `PromiseT[M]` monad transformer which is isomorphic to M[Promise[A]]. An implicit conversion from PromiseT[M]#Apply[A] to M[A] will make the transformer totally transparent.
Apocalisp
2.8.1? and when will that be released?
TiansHUo
+2  A: 

You can do this with less overhead than creating actors by using futures:

import scala.actors.Futures._
val nums = (1 to 1000).grouped(100).toList
val parts = nums.map(n => future { n.reduceLeft(_ + _) })
val whole = (0 /: parts)(_ + _())

You have to handle decomposing the problem and writing the "future" block and recomposing it in to a final answer, but it does make executing a bunch of small code blocks in parallel easy to do.

(Note that the _() in the fold left is the apply function of the future, which means, "Give me the answer you were computing in parallel!", and it blocks until the answer is available.)

A parallel collections library would automatically decompose the problem and recompose the answer for you (as with pmap in Clojure); that's not part of the main API yet.

Rex Kerr
Yeah, I see the `getAllResults`, but how about `getFirstResultandThrowAwayEverythingElseFunction`? Instead of using `all`, we would need `any`
TiansHUo
That's more work right now--and not very efficient in any language, mind you--and would involve replacing `future` with `actor { loop { react { /* case code */ } } } ! message` and then receiving the first reply and ignoring the rest. Anyway, point is, you can certainly build it out of actors (even with auto-exit, if you use link), but it is not there out of the box. Improved concurrency is one of the major foci of 2.9.
Rex Kerr
2.9? And when is that? Or should I ask when is Scala ready for major production use? (Yeah, there's twitter, but, there's only twitter)
TiansHUo
2.9 will not be ready soon enough for you to use it on the timescale over which you wanted an answer to this question. The point is only that you are not the only one who has recognized that such things would be nice, and improvements are being planned. Until then, you can create the functionality you want but with somewhat more work using lower-level constructs (Java threads and/or Scala actors).
Rex Kerr
+1  A: 

I'm not waiting for Scala 2.8.1 or 2.9, it would rather be better to write my own library or use another, so I did more googling and found this: akka http://doc.akkasource.org/actors

which has an object futures with methods

awaitAll(futures: List[Future]): Unit
awaitOne(futures: List[Future]): Future

but http://scalablesolutions.se/akka/api/akka-core-0.8.1/ has no documentation at all. That's bad.

But the good part is that akka's actors are leaner than scala's native ones
With all of these libraries (including scalaz) around, it would be really great if scala itself could eventually merge them officially

TiansHUo
+4  A: 

With Scala Parallel collections that will be included in 2.8.1 you will be able to do things like this:

val spliced = myList.par // obtain a parallel version of your collection (all operations are parallel)
spliced.map(process _)   // maps each entry into a corresponding entry using `process`
spliced.find(check _)    // searches the collection until it finds an element for which
                         // `check` returns true, at which point the search stops, and the element is returned

and the code will automatically be done in parallel. Other methods found in the regular collections library are being parallelized as well.

Currently, 2.8.RC2 is very close (this or next week), and 2.8 final will come in a few weeks after, I guess. You will be able to try parallel collections if you use 2.8.1 nightlies.

Alex
2.8.1 so soon! Nice!
Rex Kerr
Wow, that looks really great. I guess foldLeft can also work here too
TiansHUo