views:

219

answers:

2

I want to get an array of bytes (Array[Byte]) from somewhere (read from file, from socket, etc) and then provide a efficient way to pull bits out of it (e.g. provide a function to extract a 32-bit integer from offset N in array). I would then like to wrap the byte array (hiding it) providing functions to pull bits out from the array (probably using lazy val for each bit to pull out).

I would imagine having a wrapping class that takes an immutable byte array type in the constructor to prove the array contents is never modified. IndexedSeq[Byte] seemed relevant, but I could not work out how to go from Array[Byte] to IndexedSeq[Byte].

Part 2 of the question is if I used IndexedSeq[Byte] will the resultant code be any slower? I need the code to execute as fast as possible, so would stick with Array[Byte] if the compiler could do a better job with it.

I could write a wrapper class around the array, but that would slow things down - one extra level of indirection for each access to bytes in the array. Performance is critical due to the number of array accesses that will be required. I need fast code, but would like to do the code nicely at the same time. Thanks!

PS: I am a Scala newbie.

+10  A: 

Treating Array[T] as an IndexedSeq[T] could hardly be simpler:

Array(1: Byte): IndexedSeq[Byte] // trigger an Implicit View
wrapByteArray(Array(1: Byte))    // explicitly calling

Unboxing will kill you long before an extra layer of indirection.

C:\>scala -Xprint:erasure -e "{val a = Array(1: Byte); val b1: Byte = a(0); val
b2 = (a: IndexedSeq[Byte])(0)}"
[[syntax trees at end of erasure]]// Scala source: scalacmd5680604016099242427.s
cala

val a: Array[Byte] = scala.Array.apply((1: Byte), scala.this.Predef.
wrapByteArray(Array[Byte]{}));
val b1: Byte = a.apply(0);
val b2: Byte = scala.Byte.unbox((scala.this.Predef.wrapByteArray(a): IndexedSeq).apply(0));

To avoid this, the Scala collections library should be specialized on the element type, in the same style as Tuple1 and Tuple2. I'm told this is planned, but it's a bit more involved than simply slapping @specialized everywhere, so I don't know how long it will take.

UPDATE

Yes, WrappedArray is mutable, although collection.IndexedSeq[Byte] doesn't have methods to mutate, so you could just trust clients not to cast to a mutable interface. The next release of Scalaz will include ImmutableArray which prevents this.

The boxing comes retrieving an element from the collection via this generic method:

trait SeqLike[+A, +Repr] extends IterableLike[A, Repr] { self =>
  def apply(idx: Int): A
}

At the JVM level, this signature is type-erased to:

  def apply(idx: Int): Object

If your collection contains primitives, that is, subtypes of AnyVal, they must be boxed in the corresponding wrapper to be returned from this method. For some applications, this is a major performance concern. Entire libraries have been written in Java to avoid this, notably fastutils.

Annotation directed specialization was added to Scala 2.8 to instruct the compiler to generate various versions of a class or method tailored to the permutations of primitive types. This has been applied to a few places in the standard library already, e.g. TupleN, ProductN, Function{0, 1, 2}. If this was also applied to the collections hierarchy, this performance cost could be alleviated.

retronym
Sorry, as a Scala newbie I don't understand your example. Are you saying that using Array[Byte] will do lots of unboxing? Also I was not sure how to trigger the implicit view. I naively added ":IndexedSeq[Byte]" after my array instantiation but it complained with "type mismatch". Also, wrapByteArray() seems to return a mutable IndexedSeq[Byte], not the immutable version.
Alan Kent
+7  A: 

If you want to work with sequences in Scala, I recommend you choose one of these:

Immutable seqs:

(linked seqs) List, Stream, Queue

(indexed seqs) Vector

Mutable seqs:

(linked seq) ListBuffer

(indexed seq) ArrayBuffer

The new (2.8) Scala collections have been hard to grasp for me, primarily due to shortage of (correct) documentation but also because of the source code (complex hierarchys). To clear my mind I made this pic to visualize the basic structure:

alt text

Also, note that Array is not part of the tree structure, it is a special case, since it wraps the Java array (which is a special case in Java).

olle kullberg
In Scala 2.8, its `Array` is to Java's arrays as Scala's `String` is to Java's `String`: One and the same, not wrapped. There are wrappers with corresponding implicit conversions that allow things like Scala's HOFs for sequences to be used with these "borrowed" types, but the basic `Array` (and `String`) are the Java entities.
Randall Schulz
Thanks - but any comment on performance of the approach? That is, if I use IndexedSeq[Byte] in my code, will it be as fast as Array[Byte] (byte[] in Java)? I note another response warns about unboxing overheads - are these going to kill Scala performance for dealing with arrays of bytes?
Alan Kent
`Vector` (the only instantiable subclass of `IndexedSeq` other than the various `Range` classes and `WrappedString`) is not specialized (for any element type) in Scala 2.8.0, so you are going to pay a high memory overhead as well as a marginal access time cost, all due to boxing and unboxing, as you say.
Randall Schulz
@Randall I do not understand this! I mean, for the java.lang.String class there is only the scala.collection.immutable.StringOps wrapper class that java.lang.String is converted to implicitly when need be. There is no Scala class for String, right? If the Java array was used in the same way in Scala as String then we would use the Java array directly, and it would be converted to some ArrayOps when in need, right? But instead we seem to be using scala.Array all the time. When I look in the source code for scala.Array, it looks like a real implementation.
olle kullberg
@olle I'm referring to the situation in Scala 2.8.0. In Scala 2.7, arrays were implemented entirely differently. Which are you using? And you're right, Scala has no String of its own, it uses Java's directly. That did not used to be the way Scala arrays worked (before 2.8.0) but now the same situation holds for arrays as for strings, through arrays are not as simple, since they're not a single class, but rather a family of classes.
Randall Schulz
@Randall I am using 2.8. To me the scala.Array implementation looks strange, just a final class where all methods are deprecated or return errors: final class Array[T](_length: Int) {....}On the other hand, the companion object looks more normal.object Array extends FallbackArrayBuilding { ....}Looks like the companion object is actually using the scala.Array class. What I am trying to say is that I do not see the connection between scala.Array and Java array anywhere in this code. Is it built into the compiler or what?
olle kullberg
@olle Yes. Because all Scala arrays are Java arrays and because arrays are family of classes, not a single class like `String`, compiling array code must be treated specially. I don't understand the details, but I'm pretty sure the class scala.Array is not related to actual arrays at run time. Note that it is a final class and has no public constructor. I don't believe that class is ever instantiated.
Randall Schulz
@Randall ok ,thanks.
olle kullberg
I really should add this to my own answer explaining the new collection library, but there's now `TraversableOnce` which is inherited by both `Traversable` and `Iterator`.
Daniel