views:

3666

answers:

1

Following on from my breathless confusion, what are some good resources which explain how the new Scala 2.8 collections library has been structured. I'm interested to find some information on how the following fit together:

  • The collection classes/traits themselves (e.g. List, Iterable)
  • Why the Like classes exist (e.g. TraversableLike)
  • What the companion methods are for (e.g. List.companion)
  • How I know what implicit objects are in scope at a given point
+76  A: 

There's a 2.8 collection walk-through by Martin Odersky which should probably be your first reference. The rest of this answer was written way before any such thing existed (in fact, before 2.8.0 itself was released).

You can find a paper about it as Scala SID #3. Other papers in that area should be interesting as well to people interested in the differences between Scala 2.7 and 2.8.

I'll try to complement the answer with stuff from that paper.

  • The collection classes/traits themselves (e.g. List, Iterable)

I'll quote from the paper, selectively, and complement with some thoughts of mine.

There are actually three hierarchies of traits for the collections: one for mutable collections, one for immutable collections, and one which doesn't make any assumptions about the collections.

The following image shows the non-specific hierarchy: General collection hierarchy

All elements shown are traits. In the other two hierarchies there are also classes directly inheriting the traits as well as classes which can be viewed as belonging in that hierarchy through implicit conversion to wrapper classes. The legend for these graphs can be found after them.

Graph for immutable hierarchy: Immutable collection hierarchy

Graph for mutable hierarchy: Mutable collection hierarchy

Legend:

Graph legend

Here's an abbreviated ASCII depiction of the collection hierarchy, for those who can't see the images.

                    Traversable
                         |
                         |
                      Iterable
                         |
      +------------------+--------------------+
     Map                Set                  Seq
      |                  |                    |
      |             +----+----+         +-----+------+
    Sorted Map  SortedSet   BitSet   Buffer Vector LinearSeq

Trait Traversable

At the top of this hierarchy is trait Traversable. Its only abstract operation is

def foreach[U](f: Elem => U)

The operation is meant to traverse all elements of the collection, and apply the given operation f to each element. The application is done for its side effect only; in fact any function result of f is discarded by foreach.

Traversible objects can be finite or infinite. An example of an infinite traversable object is the stream of natural numbers Stream.from(0). The method hasDefiniteSize indicates whether a collection is possibly inifinite. If hasDefiniteSize returns true, the collection is certainly finite. If it returns false, the collection has not been not fully elaborated yet, so it might be infinite or finite.

This class defines methods which can be efficiently implemented in terms of foreach (over 40 of them).

Trait Iterable

This trait declares an abstract method iterator that returns an iterator that yields all the collection’s elements one by one. The foreach method in Iterable is implemented in terms of iterator. Subclasses of Iterable often override foreach with a direct implementation for efficiency.

Class Iterable also adds some less-often used methods to Traversable, which can be implemented efficiently only if an iterator is available. They are summarized below.

xs.iterator          An iterator that yields every element in xs, in the same order as foreach traverses elements.
xs takeRight n       A collection consisting of the last n elements of xs (or, some arbitrary n elements, if no order is defined).
xs dropRight n       The rest of the collection except xs takeRight n.
xs sameElements ys   A test whether xs and ys contain the same elements in the same order

Other Classes

After Iterable there come three base classes which inherit from it: Seq, Set, and Map. All three have an apply method and all three implement the PartialFunction trait, but the meaning of apply is different in each case.

I trust the meaning of Seq, Set and Map is intuitive. After that, the classes break up in specific implementations that offer particular guarantees with regards to performance, and the methods it makes available as a result of it.

The listing below may be improved. Leave a comment with suggestions and I'll fix it.

Base Classes and Traits

  • Traversable -- Basic collection class. Can be implemented just with foreach.
    • TraversableProxy -- Proxy for a Traversable. Just point self to the real collection.
    • TraversableView -- A Traversable with some non-strict methods.
    • TraversableForwarder -- Forwards most methods to underlying, except toString, hashCode, equals, stringPrefix, newBuilder, view and all calls creating a new iterable object of the same kind.
    • mutable.Traversable and immutable.Traversable -- same thing as Traversable, but restricting the collection type.
    • Other special-cases Iterable classes, such as MetaData, exists.
    • Iterable -- A collection for which an Iterator can be created (through iterator).
      • IterableProxy, IterableView, mutable and immutable.Iterable.
  • Iterator -- A trait which is not descendant of Traversable. Define next and hasNext.
    • CountedIterator -- An Iterator defining count, which returns the elements seen so far.
    • BufferedIterator -- Defines head, which returns the next element without consuming it.
    • Other special-cases Iterator classes, such as Source, exists.

The Maps

  • Map -- An Iterable of Tuple2, which also provides methods for retrieving a value (the second element of the tuple) given a key (the first element of the tuple). Extends PartialFunction as well.
    • MapProxy -- A Proxy for a Map.
    • DefaultMap -- A trait implementing some of Map's abstract methods.
    • SortedMap -- A Map whose keys are sorted.
      • immutable.SortMap
        • immutable.TreeMap -- A class implementing immutable.SortedMap.
    • immutable.Map
      • immutable.MapProxy
      • immutable.HashMap -- A class implementing immutable.Map through key hashing.
      • immutable.IntMap -- A class implementing immutable.Map specialized for Int keys. Uses a tree based on the binary digits of the keys.
      • immutable.ListMap -- A class implementing immutable.Map through lists.
      • immutable.LongMap -- A class implementing immutable.Map specialized for Long keys. See IntMap.
      • There are additional classes optimized for an specific number of elements.
    • mutable.Map
      • mutable.HashMap -- A class implementing mutable.Map through key hashing.
      • mutable.ImmutableMapAdaptor -- A class implementing a mutable.Map from an existing immutable.Map.
      • mutable.LinkedHashMap -- ?
      • mutable.ListMap -- A class implementing mutable.Map through lists.
      • mutable.MultiMap -- A class accepting more than one distinct value for each key.
      • mutable.ObservableMap -- A mixin which, when mixed with a Map, publishes events to observers through a Publisher interface.
      • mutable.OpenHashMap -- A class based on an open hashing algorithm.
      • mutable.SynchronizedMap -- A mixin which should be mixed with a Map to provide a version of it with synchronized methods.
      • mutable.MapProxy.

The Sequences

  • Seq -- A sequence of elements. One assumes a well-defined size and element repetition. Extends PartialFunction as well.
    • IndexedSeq -- Sequences that support O(1) element access and O(1) length computation.
      • IndexedSeqView
      • immutable.PagedSeq -- An implementation of IndexedSeq where the elements are produced on-demand by a function passed through the constructor.
      • immutable.IndexedSeq
        • immutable.Range -- A delimited sequence of integers, closed on the lower end, open on the high end, and with a step.
          • immutable.Range.Inclusive -- A Range closed on the high end as well.
          • immutable.Range.ByOne -- A Range whose step is 1.
        • immutable.NumericRange -- A more generic version of Range which works with any Integral.
          • immutable.NumericRange.Inclusive, immutable.NumericRange.Exclusive.
          • immutable.WrappedString, immutable.RichString -- Wrappers which enables seeing a String as a Seq[Char], while still preserving the String methods. I'm not sure what the difference between them is.
      • mutable.IndexedSeq
        • mutable.GenericArray -- An Seq-based array-like structure. Note that the "class" Array is Java's Array, which is more of a memory storage method than a class.
        • mutable.ResizableArray -- Internal class used by classes based on resizable arrays.
        • mutable.PriorityQueue, mutable.SynchronizedPriorityQueue -- Classes implementing prioritized queues -- queues where the elements are dequeued according to an Ordering first, and order of queueing last.
        • mutable.PriorityQueueProxy -- an abstract Proxy for a PriorityQueue.
    • LinearSeq -- A trait for linear sequences, with efficient time for isEmpty, head and tail.
      • immutable.LinearSeq
        • immutable.List -- An immutable, singlely-linked, list implementation.
        • immutable.Stream -- A lazy-list. Its elements are only computed on-demand, but memoized (kept in memory) afterwards. It can be theoretically infinite.
      • mutable.LinearSeq
        • mutable.DoublyLinkedList -- A list with mutable prev, head (elem) and tail (next).
        • mutable.LinkedList -- A list with mutable head (elem) and tail (next).
        • mutable.MutableList -- A class used internally to implement classes based on mutable lists.
          • mutable.Queue, mutable.QueueProxy -- A data structure optimized for FIFO (First-In, First-Out) operations.
          • mutable.QueueProxy -- A Proxy for a mutable.Queue.
    • SeqProxy, SeqView, SeqForwarder
    • immutable.Seq
      • immutable.Queue -- A class implementing a FIFO-optimized (First-In, First-Out) data structure. There is no common superclass of both mutable and immutable queues.
      • immutable.Stack -- A class implementing a LIFO-optimized (Last-In, First-Out) data structure. There is no common superclass of both mutable immutable stacks.
      • immutable.Vector -- ?
      • scala.xml.NodeSeq -- A specialized XML class which extends immutable.Seq.
      • immutable.IndexedSeq -- As seen above.
      • immutable.LinearSeq -- As seen above.
    • mutable.ArrayStack -- A class implementing a LIFO-optimized data structure using arrays. Supposedly significantly faster than a normal stack.
    • mutable.Stack, mutable.SynchronizedStack -- Classes implementing a LIFO-optimized data structure.
    • mutable.StackProxy -- A Proxy for a mutable.Stack..
    • mutable.Seq
      • mutable.Buffer -- Sequence of elements which can be changed by appending, prepending or inserting new members.
        • mutable.ArrayBuffer -- An implementation of the mutable.Buffer class, with constant amortized time for the append, update and random access operations. It has some specialized subclasses, such as NodeBuffer.
        • mutable.BufferProxy, mutable.SynchronizedBuffer.
        • mutable.ListBuffer -- A buffer backed by a list. It provides constant time append and prepend, with most other operations being linear.
        • mutable.ObservableBuffer -- A mixin trait which, when mixed to a Buffer, provides notification events through a Publisher interfaces.
        • mutable.IndexedSeq -- As seen above.
        • mutable.LinearSeq -- As seen above.

The Sets

  • Set -- A set is a collection that includes at most one of any object.
    • BitSet -- A set of integers stored as a bitset.
      • immutable.BitSet
      • mutable.BitSet
    • SortedSet -- A set whose elements are ordered.
      • immutable.SortedSet
        • immutable.TreeSet -- An implementation of a SortedSet based on a tree.
    • SetProxy -- A Proxy for a Set.
    • immutable.Set
      • immutable.HashSet -- An implementation of Set based on element hashing.
      • immutable.ListSet -- An implementation of Set based on lists.
      • Additional set classes exists to provide optimized implementions for sets from 0 to 4 elements.
      • immutable.SetProxy -- A Proxy for an immutable Set.
    • mutable.Set
      • mutable.HashSet -- An implementation of Set based on element hashing.
      • mutable.ImmutableSetAdaptor -- A class implementing a mutable Set from an immutable Set.
      • LinkedHashSet -- An implementation of Set based on lists.
      • ObservableSet -- A mixin trait which, when mixed with a Set, provides notification events through a Publisher interface.
      • SetProxy -- A Proxy for a Set.
      • SynchronizedSet -- A mixin trait which, when mixed with a Set, provides notification events through a Publisher interface.

  • Why the Like classes exist (e.g. TraversableLike)

This was done to achieve maximum code reuse. The concrete generic implementation for classes with a certain structure (a traversable, a map, etc) is done in the Like classes. The classes intended for general consumption, then, override selected methods that can be optmized.

  • What the companion methods are for (e.g. List.companion)

The builder for the classes, ie, the object which knows how to create instances of that class in a way that can be used by methods like map, is created by a method in the companion object. So, in order to build an object of type X, I need to get that builder from the companion object of X. Unfortunately, there is no way, in Scala, to get from class X to object X. Because of that, there is a method defined in each instance of X, companion, which returns the companion object of class X.

While there might be some use for such method in normal programs, its target is enabling code reuse in the collection library.

  • How I know what implicit objects are in scope at a given point

You aren't supposed to care about that. They are implicit precisely so that you don't need to figure out how to make it work.

These implicits exists to enable the methods on the collections to be defined on parent classes but still return a collection of the same type. For example, the map method is defined on TraversableLike, but if you used on a List you'll get a List back.

Daniel
Daniel,Thanks for this. You are an asset to the Scala community
agilefall
Agreed. Daniel deserves a special badge for consistently detailed and comprehensive answers. It's just a shame that there aren't many points on offer under the scala tags
oxbow_lakes
Don't worry, I got the Scala badge. At the time I got it, I was the first one too. :-)
Daniel
This answer is epic.
pelotom
Great answer!!!
pedrofurla