views:

224

answers:

3

I've tried different collections in Scala to sum it's elements and they are much slower than Java sums it's arrays (with for cycle). Is there a way for Scala to be as fast as Java arrays?

I've heard that in scala 2.8 arrays will be same as in java, but they are much slower in practice

+4  A: 

Scala 2.8 Array are JVM / Java arrays and as such have identical performance characteristics. But that means they cannot directly have extra methods that unify them with the rest of the Scala collections. To provide the illusion that arrays have these methods, there are implicit conversions to wrapper classes that add those capabilities. If you are not careful you'll incur inordinate overhead using those features.

In those cases where iteration overhead is critical, you can explicitly get an iterator (or maintain an integer index, for indexed sequential structures like Array or other IndexedSeq) and use a while loop, which is a language-level construct that need not operate on functions (literals or otherwise) but can compile in-line code blocks.

val l1 = List(...) // or any Iteralbe
val i1 = l1.iterator
while (i1.hasNext) {
  val e = i1.next
  // Do stuff with e
}

Such code will execute essentially as fast as a Java counterpart.

Randall Schulz
Hi, Randall. Thanks for your answer. I made a test adding 10 mln doubles in Java and in Scala using your answer and the results are 23.23ms vs 141ms. So, is there anything else that can help?
Tala
@Tala: The usual benchmark caveats apply. Are you aware of the issues of micro-benchmarking JVM code?
Randall Schulz
Scala 2.8, IterableLike: "def foreach[U](f: A => U): Unit = iterator.foreach(f)" Iterator: "def foreach[U](f: A => U) { while (hasNext) f(next()) }" Assuming f does not need boxing (because of "@specialized"), l1.foreach should have practically the same performance as Randall's while loop, shouldn't it?
Sandor Murakozi
@Sandor Murakozi: Perhaps. The while loop does not use a `Function` object while `foreach` does. Depending on the details of the code inside the loop and how good the JIT compiler is, the code could get inlined. And, of course, if the body of the block is large, then the subroutine call overhead would presumably be trivial. The only thing that can be said for sure is that if it's really performance-critical code, a *proper* benchmark should be made.
Randall Schulz
+13  A: 

Indexing into arrays in a while loop is as fast in Scala as in Java. (Scala's "for" loop is not the low-level construct that Java's is, so that won't work the way you want.)

Thus if in Java you see

for (int i=0 ; i < array.length ; i++) sum += array(i)

in Scala you should write

var i=0
while (i < array.length) {
  sum += array(i)
  i += 1
}

and if you do your benchmarks appropriately, you'll find no difference in speed.

If you have iterators anyway, then Scala is as fast as Java in most things. For example, if you have an ArrayList of doubles and in Java you add them using

for (double d : arraylist) { sum += d }

then in Scala you'll be approximately as fast--if using an equivalent data structure like ArrayBuffer--with

arraybuffer.foreach( sum += _ )

and not too far off the mark with either of

sum = (0 /: arraybuffer)(_ + _)
sum = arraybuffer.sum  // 2.8 only

Keep in mind, though, that there's a penalty to mixing high-level and low-level constructs. For example, if you decide to start with an array but then use "foreach" on it instead of indexing into it, Scala has to wrap it in a collection (ArrayOps in 2.8) to get it to work, and often will have to box the primitives as well.

Anyway, for benchmark testing, these two functions are your friends:

def time[F](f: => F) = {
  val t0 = System.nanoTime
  val ans = f
  printf("Elapsed: %.3f\n",1e-9*(System.nanoTime-t0))
  ans
}

def lots[F](n: Int, f: => F): F = if (n <= 1) f else { f; lots(n-1,f) }

For example:

val a = Array.tabulate(1000000)(_.toDouble)
val ab = new collection.mutable.ArrayBuffer[Double] ++ a
def adSum(ad: Array[Double]) = {
  var sum = 0.0
  var i = 0
  while (i<ad.length) { sum += ad(i); i += 1 }
  sum
}

// Mixed array + high-level; convenient, not so fast
scala> lots(3, time( lots(100,(0.0 /: a)(_ + _)) ) )
Elapsed: 2.434
Elapsed: 2.085
Elapsed: 2.081
res4: Double = 4.999995E11

// High-level container and operations, somewhat better
scala> lots(3, time( lots(100,(0.0 /: ab)(_ + _)) ) )    
Elapsed: 1.694
Elapsed: 1.679
Elapsed: 1.635
res5: Double = 4.999995E11

// High-level collection with simpler operation
scala> lots(3, time( lots(100,{var s=0.0;ab.foreach(s += _);s}) ) )
Elapsed: 1.171
Elapsed: 1.166
Elapsed: 1.162
res7: Double = 4.999995E11

// All low level operations with primitives, no boxing, fast!
scala> lots(3, time( lots(100,adSum(a)) ) )              
Elapsed: 0.185
Elapsed: 0.183
Elapsed: 0.186
res6: Double = 4.999995E11
Rex Kerr
How long does `a.sum` takes?
Daniel
@Rex Kerr, Very nice answer, I'd been scaring myself with a for loop attempt ...
Don Mackenzie
@Daniel - `a.sum` takes about as long as `(0 /: a)(_ + _)`, at least as of 2.8.0.RC6.
Rex Kerr
@Don - Just keep in mind that "lots" is, itself, not quite as fast as a while loop. So don't put really, really inexpensive stuff in there and expect that you'll get an accurate timing.
Rex Kerr
@Rex Kerr - thanks for your exhaustive answer.
Tala
+5  A: 

It is very difficult to explain why some code you haven't shown performs worse than some other code you haven't shown in some benchmark you haven't shown.

You may be interested in this question and its accepted answer, for one thing. But benchmarking JVM code is hard, because the JIT will optimize code in ways that are difficult to predict (which is why JIT beats traditional optimization at compile time).

Daniel
Hi, Daniel. Thanks for the link. It was very helpful.
Tala