As you noted, the [0 1]
establishes the base cases: The first two values in the sequence are zero, then one. After that, each value is to be the sum of the previous value and the value before that one. Hence, we can't even compute the third value in the sequence without having at least two that come before it. That's why we need two values with which to start off.
Now look at the map
form. It says to take the head items from two different sequences, combine them with the +
function (adding multiple values to produce one sum), and expose the result as the next value in a sequence. The map
form is zipping together two sequences -- presumably of equal length -- into one sequence of the same length.
The two sequences fed to map
are different views of the same basic sequence, shifted by one element. The first sequence is "all but the first value of the base sequence". The second sequence is the base sequence itself, which, of course, includes the first value. But what should the base sequence be?
The definition above said that each new element is the sum of the previous (Z - 1) and the predecessor to the previous element (Z - 2). That means that extending the sequence of values requires access to the previously computed values in the same sequence. We definitely need a two-element shift register, but we can also request access to our previous results instead. That's what the recursive reference to the sequence called fib-seq
does here. The symbol fib-seq
refers to a sequence that's a concatenation of zero, one, and then the sum of its own Z - 2 and Z - 1 values.
Taking the sequence called fib-seq
, drawing the first item yields the first element of the [0 1]
vector -- zero. Drawing the second item yields the second element of the vector -- one. Upon drawing the third item, we consult the map
to generate a sequence and use that as the remaining values. The sequence generated by map
here starts out with the sum of the first item of "the rest of" [0 1]
, which is one, and the first item of [0 1]
, which is zero. That sum is one.
Drawing the fourth item consults map
again, which now must compute the sum of the second item of "the rest of" the base sequence, which is the one generated by map
, and the second item of the base sequence, which is the one from the vector [0 1]
. That sum is two.
Drawing the fifth item consults map
, summing the third item of "the rest of" the base sequence -- again, the one resulting from summing zero and one -- and the third item of the base sequence -- which we just found to be two.
You can see how this is building up to match the intended definition for the series. What's harder to see is whether drawing each item is recomputing all the preceding values twice -- once for each sequence examined by map
. It turns out there's no such repetition here.
To confirm this, augment the definition of fib-seq
like this to instrument the use of function +
:
(def fib-seq
(lazy-cat [0 1]
(map
(fn [a b]
(println (format "Adding %d and %d." a b))
(+ a b))
(rest fib-seq) fib-seq)))
Now ask for the first ten items:
> (doall (take 10 fib-seq))
Adding 1 and 0.
Adding 1 and 1.
Adding 2 and 1.
Adding 3 and 2.
Adding 5 and 3.
Adding 8 and 5.
Adding 13 and 8.
Adding 21 and 13.
(0 1 1 2 3 5 8 13 21 34)
Notice that there are eight calls to +
to generate the first ten values.
Since writing the preceding discussion, I've spent some time studying the implementation of lazy sequences in Clojure -- in particular, the file LazySeq.java -- and thought this would be a good place to share a few observations.
First, note that many of the lazy sequence processing functions in Clojure eventually use lazy-seq
over some other collection. lazy-seq
creates an instance of the Java type LazySeq
, which models a small state machine. It has several constructors that allow it to start in different states, but the most interesting case is the one that starts with just a reference to a nullary function. Constructed that way, the LazySeq
has neither evaluated the function nor found a delegate sequence (type ISeq
in Java). The first time one asks the LazySeq
for its first element -- via first
-- or any successors -- via next
or rest
-- it evaluates the function, digs down through the resulting object to peel away any wrapping layers of other LazySeq
instances, and finally feeds the innermost object through the java function RT#seq()
, which results in an ISeq
instance.
At this point, the LazySeq
has an ISeq
to which to delegate calls on behalf of first
, next
, and rest
. Usually the "head" ISeq
will be of type Cons
, which stores a constant value in its "first" (or "car") slot and another ISeq
in its "rest" (or "cdr") slot. That ISeq
in its "rest" slot can in turn be a LazySeq
, in which case accessing it will again require this same evaluation of a function, peeling away any lazy wrappers on the return value, and passing that value through RT#seq()
to yield another ISeq
to which to delegate.
The LazySeq
instances remain linked together, but having forced one (through first
, next
, or rest
) causes it to delegate straight through to some non-lazy ISeq
thereafter. Usually that forcing evaluates a function that yields a Cons
bound to first value and its tail bound to another LazySeq
; it's a chain of generator functions that each yield one value (the Cons
's "first" slot) linked to another opportunity to yield more values (a LazySeq
in the Cons
's "rest" slot).
Tying this back, in the Fibonacci Sequence example above, map
will take each of the nested references to to fib-seq
and walk them separately via repeated calls to rest
. Each such call will transform at most one LazySeq
holding an unevaluated function into a LazySeq
pointing to something like a Cons
. Once transformed, any subsequent accesses will quickly resolve to the Cons
es -- where the actual values are stored. When one branch of the map
zipping walks fib-seq
one element behind the other, the values have already been resolved and are available for constant-time access, with no further evaluation of the generator function required.
Here are some diagrams to help visualize this interpretation of the code:
+---------+
| LazySeq |
fib-seq | fn -------> (fn ...)
| sv |
| s |
+---------+
+---------+
| LazySeq |
fib-seq | fn -------> (fn ...) -+
| sv <------------------+
| s |
+---------+
+---------+
| LazySeq |
fib-seq | fn |
| sv -------> RT#seq() -+
| s <------------------+
+---------+
+---------+ +------+
| LazySeq | | ISeq |
fib-seq | fn | | |
| sv | | |
| s ------->| |
+---------+ +------+
+---------+ +--------+ +------+
| LazySeq | | Cons | | ISeq |
fib-seq | fn | | first ---> 1 | |
| sv | | more -------->| |
| s ------->| | | |
+---------+ +--------+ +------+
+---------+ +--------+ +---------+
| LazySeq | | Cons | | LazySeq |
fib-seq | fn | | first ---> 1 | fn -------> (fn ...)
| sv | | more -------->| sv |
| s ------->| | | s |
+---------+ +--------+ +---------+
As map
progresses, it hops from LazySeq
to LazySeq
(and hence Cons
to Cons
), and the rightmost edge only expands the first time one call first
, next
, or rest
on a given LazySeq
.