views:

130

answers:

2

I have an iterator of lines from a very large file that need to be put in groups as I move along. I know where each group ends because there is a sentinel value on the last line of each group. So basically I want to write a function that takes an iterator and a sentinel value, and returns an iterator of groups each terminated by the sentinel value. Something like:

scala> groups("abc.defg.hi.jklmn.".iterator, '.')
res1: Iterator[Seq[Char]] = non-empty iterator

scala> groups("abc.defg.hi.jklmn.".iterator, '.').toList
res19: List[Seq[Char]] = List(List(a, b, c, .), List(d, e, f, g, .), List(h, i, .), List(j, k, l, m, n, .))

Note that I want the sentinel items included at the end of each of the groups. Here's my current solution:

def groups[T](iter: Iterator[T], sentinel: T) = new Iterator[Seq[T]] {                   
  def hasNext = iter.hasNext
  def next = iter.takeWhile(_ != sentinel).toList ++ List(sentinel)
}

I think this will work, and I guess it is fine, but having to re-add the sentinel every time gives me a code smell. Is there a better way to do this?

+3  A: 

Less readable than yours, but more "correct" when final group doesn't have a terminating sentinel value:

def groups[T](iter: Iterator[T], sentinel: T) = new Iterator[Seq[T]] {
 def hasNext = iter.hasNext
 def next: Seq[T] = {
     val builder = scala.collection.mutable.ListBuffer[T]()
     while (iter.hasNext) {
       val x = iter.next
       builder.append(x)
       if (x == sentinel) return builder
     }
     builder
 }
}

Or, recursively:

  def groups[T](iter: Iterator[T], sentinel: T) = new Iterator[Seq[T]] {
    def hasNext = iter.hasNext
    def next: Seq[T] = {
      @scala.annotation.tailrec
      def build(accumulator: ListBuffer[T]): Seq[T] = {
        val v = iter.next
        accumulator.append(v)
        if (v == sentinel || !iter.hasNext) => accumulator
        else build(accumulator)
      }
      build(new ListBuffer[T]())
    }
  }
Mitch Blevins
+2  A: 

Ugly, but should be more performant than your solution:

  def groups[T](iter: Iterator[T], sentinel: T) = new Iterator[Seq[T]] {                   
    def hasNext = iter.hasNext
    def next = iter.takeWhile{
      var last = null.asInstanceOf[T]
       c => { val temp = last; last = c; temp != sentinel}
     }.toList
  }
Landei
Wow, that's ugly, but cool. =) You can move the "var last" out to a private variable, and then it looks a little less ugly.
Steve