tags:

views:

97

answers:

1

Reading a recent question I identified the function being discussed

(def fib-seq
    (lazy-cat [0 1] (map + (rest fib-seq) fib-seq)))

as holding onto the head of a sequence, but it occurred to me re-reading my answer that I had glossed over the details like they were obvious, so I went back to clarify and came up short. I know that fib-seq is a var and that as long as it's around it will hold all the elements in the sequence, but I'm not clear at all on the exact mechanics of how exactly the sequence is getting held onto. Any clarifications would be appreciated.

+7  A: 

Basically, regular GC rules apply... A sequence is just an object and holding onto its head means holding onto a reference to this object. This entails holding as much of the sequence as has already been realised in memory, because Clojure sequences are cached.

(A more detailed explanation follows -- see the fragment in bold for the gist of it... ;-))

'A sequence' in Clojure is an object which implements the ISeq interface. This provides methods which extract the first element of the sequence and the rest of the sequence (another object implementing ISeq). As a key detail, these take care not only of computing the correct object (first / rest of the sequence) and returning it to the caller, but also of caching the computed value in memory so any subsequent requests are faster -- and, more importantly, so that all requests for the same element of the sequence are guaranteed to return the same value, even if the ISeq is being generated on top of a mutable Java object which changes at some point. (Note this is absolutely crucial to the immutable semantics of Clojure sequences.)

A Var, on the other hand, is a container which holds, in rough terms, a 'pointer' to some Java object. If this happens to be an ISeq, then as long as the Var itself is not garbage collected (which it obviously won't ever be if it's a top level var in a currently existing namespace) or rebound, the ISeq itself will not be garbage collected and, in particular, the memory it uses for caching first / rest of sequence will not be released.

As for the other elements of the sequence: the 'rest' of the ISeq bound to the Var is an ISeq itself. Also, it gets cached by the first ISeq. Thus the first element of the 'rest' ISeq of an ISeq bound to a Var will never be garbage collected, because a reference to it is being held by the 'rest' ISeq of the ISeq bound to the Var and this ISeq won't be GC'd because it is being cached as the 'rest' component by the ISeq bound to the Var, which in turn won't be GC'd as long as it is bound to the Var, which in turn will normally never be GC'd because it's a top-level Var in a namespace.

Clearly the Var will be GC'd if it ceases to be held onto by its namespace (ns-unmap) or the namespace itself gets tossed (remove-ns). If it happens to have held an ISeq, that ISeq will be GC'd if and only if it isn't being held by some other bit of code -- the usual GC rules apply, of course. For bindings introduced with binding and local bindings introduced with let, all the above applies modulo lifetime issues of the bindings. (Which are not a subject of this Q.)

Michał Marczyk
What is the var that is holding onto the sequence? Does defining the function fib-seq create a var pointing to the function, and when that function returns a sequence the function keeps a reference to that sequence? Is any function that returns a sequence going to hold onto it?
Nathan Hughes
The example in your question does not define any function at all! Note that you're using `def` and not `defn`, and the thing you define `fib-seq` to refer to is a sequence, not a function. So, in this particular case, the Var which the symbol `fib-seq` resolves to in the current namespace after your example `def` form is evaluated is what holds onto the sequence of Fibonacci numbers produces by the `lazy-cat` form. The fact that this doesn't go into an infinite loop is due to `lazy-cat` producing a *lazy* sequence, to be realised as you request elements from it.
Michał Marczyk
Your answer from that other question you link to in this Q contains Christophe Grand's function which produces the sequence of all Fibonacci numbers in a lazy fashion and without holding onto the head -- note that in that function *no name is ever assigned to the entirety of the sequence*. If you did something like `(def fib-seq (fibo))`, then surely enough you'd hold onto as much of the sequence as you'd have realised for as long as this binding wasn't changed with another `(def fib-seq ...)` expression or `alter-var-root`. Also see http://clojure.org/special_forms for info on `def` / `defn`.
Michał Marczyk
Oh, one more thing... If you did something like `(doseq [fib (fibo)] (println fib))`, then you'd go on printing Fibonacci numbers forever, but the sequence produced by the initial call to `fibo` wouldn't be kept around, basically because it never gets bound to any name. Does this make it any clearer...?
Michał Marczyk
Yes, it took a while for your initial answer to sink in, I just realized what you were talking about. Thanks for the clarifications.
Nathan Hughes
Sure thing! :-)
Michał Marczyk