views:

150

answers:

4

I want a convenient way to generate an Iterable, given a initial object and a function to produce the next object from the current one, that consumes O(1) memory (i.e., it doesn't cache old results; if you want to iterate a second time, the function has to be applied again).

It doesn't seem like there's library support for this. In Scala 2.8, the method scala.collection.Iterable.iterate has signature

def iterate [A] (start: A, len: Int)(f: (A) ⇒ A) : Iterable[A]

so it requires that you specify how many iterated function applications you're interested in ahead of time, and my understanding of the documentation is that Iterable.iterate actually computes all these values immediately. On the other hand, the method scala.collection.Iterator.iterate has signature

def iterate [T] (start: T)(f: (T) ⇒ T) : Iterator[T]

which looks great, but we only get an Iterator which doesn't offer all the convenience of map, filter and friends.

Is there a convenient library method to produce what I want?

and if not,

Can someone suggest the 'colloquial' Scala code for doing this?

To summarise, given an initial object a: A, and a function f: A => A, I'd like a TraversableLike (e.g., probably an Iterable) that generates a, f(a), f(f(a)), ..., and uses O(1) memory, with map, filter etc. functions that also return something that is O(1) in memory.

+9  A: 

Stream will do what you want, just don't hold on to the cells; only iterate over the values.

It is a sadly common misconception floating around that Streams inherently cache every value they compute.

If you write this:

val s1: Stream[Thing] = initialValue #:: «expression computing next value»

then indeed every value produced by the stream is retained, but this is not necessary. If you write:

def s2: Stream[Thing] = initialValue #:: «expression computing next value»

and if the caller just iterates over the stream's values but does not remember the Stream value itself (specifically any of its cons cells), then no unwanted retention will occur. Of course, in this formulation, every call creates a new Stream starting from a fixed initial value. That's not necessary:

def s3(start: Thing): Stream[Thing] = start #:: «expression computing next value»

The one thing you have to watch out for is passing a Stream to a method. Doing so will capture the head of the stream passed in the method parameter. One way around this is to process the stream with tail-recursive code.

Randall Schulz
I don't understand -- I need to be able to pass this object around to other consumers; that is, unknown other code will actually do the iterating. I don't see how I can do this without passing a reference to the head of the `Stream`.
Scott Morrison
It is a limitation. As I said, you'll have to structure the code to pass the `Stream` through tail-call-optimized chains. But that "unknown" code knows it's getting a `Stream` so it knows it cannot retain the references to its (stream-) cons cells.
Randall Schulz
No, this really won't do. Why would the "unknown" code know anything? If someone else calls into my code, why might they not just treat the return value as say an `Iterable`?
Scott Morrison
@Scott Morrison: Is Daniel's answer (the newest one in this question) not suitable?
Randall Schulz
No: see my comment on his answer. Perhaps I didn't sufficiently clearly explain what I'm after, and should try again with a new question.
Scott Morrison
+3  A: 

Iterator.iterate demo with filter:

object I {
  def main(args:Array[String]) {
    val mb = 1024 * 1024
    val gen = Iterator.iterate(new Array[Int](10 * mb)){arr => 
      val res = new Array[Int](10 * mb)
      arr.copyToArray(res)
      println("allocated 10mb")
      res(0) = arr(0) + 1 // store iteration count in first elem of new array
      res
    }
    // take 1 out of 100
    val gen2 = gen filter (arr => arr(0) % 100 == 0) 
    // print first 10 filtered
    gen2.take(10).foreach { arr => println("filtered " + arr(0)) } 
  }
}

(this may not work in the REPL as the PRINT step may mess up with memory management)

JAVA_OPTS="-Xmx128m" scala -cp classes I will show that the filtering works and is lazy. If it wasn't done in constant memory that would cause a heap error (since it's allocating something like 900*10mb).

Use JAVA_OPTS="-Xmx128m -verbose:gc" scala -cp classes I to see the garbage collection events.

huynhjl
Thanks for the details to convince me everything is really O(1). I'll go try this out.
Scott Morrison
+2  A: 

Iterator is the very thing what you want. And iterator do has map,filter,takeWhile and many other mehtods which is O(1) in memory.I don't think there is another collection type with O(1) in memory.

Eastsun
+1  A: 
val it = new Iterable[Int] {
  def iterator = Iterator.iterate(0)(_+1)
  override
  def toString: String = "Infinite iterable"
}

Don't try it out on REPL (except by embedding it inside an object or class), as REPL will try to print it, and it doesn't use toString.

Daniel
That prints "Infinite iterable" in trunk.
extempore
@extempore Yay!
Daniel
At least as far as I understand, `it map { _ + 1 } take 5` will not terminate, however, since `map` will try to force the `Iterable`.
Scott Morrison
@Scott `Iterable` is not necessarily lazy. Unless you take the time to make all methods lazy, that's what is provided. However, `it.view map { _ + 1 } take 5` will work, so I don't see why worry about it.
Daniel