views:

157

answers:

1

I have a question about Clojure: I am trying to learn the language by going through Project Euler and I don't understand what is going on under the hood: The following code is designed to use return a list of all prime numbers up to lim. I would think that it should be O(n) in heap-space because I make a list of all the numbers up to lim, and then filter them away one by one while moving the first to a new list. (I know that I am actually making new lists each recur but I didn't think they would take more memory?) Anyway I am running this with

(defn getAllPrimes [lim] 
  (defn getPrimes [primes numlist]
    (if (not-empty numlist) ;base case;
    (recur (cons (first numlist) primes) ;put the prime on to the prime list 
           (filter
        (fn [x] (not (div? x (first numlist)))) ;remove the prime and all its multiples from the numlist
        (rest numlist)))
      primes)); return the primes
  (getPrimes () (range 2 lim))) ;call the recursive function with and empty prime list to be filled up and a full numlist to be emptied

And I keep running out of heap space, when I call

(apply + (getAllPrimes 2000000))

, but I don't run out of space on

(apply + (filter even? (range 2000000)))

So I think that I must not understand how lists are garbage collected in calls to recur and am actually using O(n*n) heap or something.

+5  A: 

I believe the problem is that with each recur, you're creating a new lazy sequence referring to the last one, so after a few iterations you're holding a seq that holds the head of a seq that holds the head of a seq that holds the head of a seq. ... All the intermediate seqs are filling up your heap.

Though writing a prime sieve is a worthwhile exercise, if you want to get to the answer, Clojure does include the sequence of prime numbers in its standard library: clojure.contrib.lazy-seqs/primes. The standard solution to this paricular Euler problem is a one-liner.

As a style point, an inner defn is not a good idea. The practical effect is the same as if the defn were at the top level, but if I'm not mistaken, the var gets reassigned every time getAllPrimes is called, and redefining vars at runtime is very strongly discouraged. Since the code is just defining a var, getPrimes is still just as visible as getAllPrimes. In this case, getPrimes could easily be rewritten as a loop/recur with no inner function, anonymous or named. That doesn't help your chain-of-lazy-seqs problem, but it does make the code a little more standard-looking:

(defn getAllPrimes [lim]
  (loop [primes ()
         numlist (range 2 lim)]
    (if (not-empty numlist)
      (recur (cons (first numlist) primes)
             (filter (fn [x] (not (div? x (first numlist))))
                     (rest numlist)))
      primes)))

I would also avoid the use of camelCase. The Clojure standard name for this function would be get-all-primes.

Getting back to the practical problem, though, the least work you could do to get your code working would be to force each seq on each iteration, i.e., wrap your filter call in a doall. I tried this, and while it still runs slowly, it at least does run to completion without running out of heap:

(defn get-all-primes [lim]
  (loop [primes ()
         numlist (range 2 lim)]
    (if (not-empty numlist)
      (recur (cons (first numlist) primes)
             (doall (filter #(not (div? % (first numlist)))
                            (rest numlist))))
      primes)))
dreish
Thanks, I appreciate the answer and also the style pointers. That was very, very helpful. I've been writing a lot of functions with the inner defn and I'll use loop in the future.
Benjie Holson
Also I knew about the primes function in the library but then I wouldn't have learned about how filter works and why I need a do-all :).
Benjie Holson