views:

2548

answers:

2

I've been working on solving Project Euler problems in Clojure to get better, and I've already run into prime number generation a couple of times. My problem is that it is just taking way too long. I was hoping someone could help me find an efficient way to do this in a Clojure-y way.

When I fist did this, I brute-forced it. That was easy to do. But calculating 10001 prime numbers took 2 minutes this way on a Xeon 2.33GHz, too long for the rules, and too long in general. Here was the algorithm:

(defn next-prime-slow
    "Find the next prime number, checking against our already existing list"
    ([sofar guess]
        (if (not-any? #(zero? (mod guess %)) sofar)
            guess                         ; Then we have a prime
            (recur sofar (+ guess 2)))))  ; Try again          

(defn find-primes-slow
    "Finds prime numbers, slowly"
    ([]
        (find-primes-slow 10001 [2 3]))   ; How many we need, initial prime seeds
    ([needed sofar]
        (if (<= needed (count sofar))
            sofar                         ; Found enough, we're done
            (recur needed (concat sofar [(next-prime-slow sofar (last sofar))])))))

By replacing next-prime-slow with a newer routine that took some additional rules into account (like the 6n +/- 1 property) I was able to speed things up to about 70 seconds.

Next I tried making a sieve of Eratosthenes in pure Clojure. I don't think I got all the bugs out, but I gave up because it was simply way too slow (even worse than the above, I think).

(defn clean-sieve
    "Clean the sieve of what we know isn't prime based"
    [seeds-left sieve]
    (if (zero? (count seeds-left))
        sieve              ; Nothing left to filter the list against
        (recur
            (rest seeds-left)    ; The numbers we haven't checked against
            (filter #(> (mod % (first seeds-left)) 0) sieve)))) ; Filter out multiples

(defn self-clean-sieve  ; This seems to be REALLY slow
    "Remove the stuff in the sieve that isn't prime based on it's self"
    ([sieve]
        (self-clean-sieve (rest sieve) (take 1 sieve)))
    ([sieve clean]
        (if (zero? (count sieve))
            clean
            (let [cleaned (filter #(> (mod % (last clean)) 0) sieve)]
                (recur (rest cleaned) (into clean [(first cleaned)]))))))

(defn find-primes
    "Finds prime numbers, hopefully faster"
    ([]
        (find-primes 10001 [2]))
    ([needed seeds]
        (if (>= (count seeds) needed)
            seeds        ; We have enough
            (recur       ; Recalculate
                needed
                (into
                    seeds    ; Stuff we've already found
                    (let [start (last seeds)
                            end-range (+ start 150000)]   ; NOTE HERE
                        (reverse               
                            (self-clean-sieve
                            (clean-sieve seeds (range (inc start) end-range))))))))))

This is bad. It also causes stack overflows if the number 150000 is smaller. This despite the fact I'm using recur. That may be my fault.

Next I tried a sieve, using Java methods on a Java ArrayList. That took quite a bit of time, and memory.

My latest attempt is a sieve using a Clojure hash-map, inserting all the numbers in the sieve then dissoc'ing numbers that aren't prime. At the end, it takes the key list, which are the prime numbers it found. It takes about 10-12 seconds to find 10000 prime numbers. I'm not sure it's fully debugged yet. It's recursive too (using recur and loop), since I'm trying to be Lispy.

So with these kind of problems, problem 10 (sum up all primes under 2000000) is killing me. My fastest code came up with the right answer, but it took 105 seconds to do it, and needed quite a bit of memory (I gave it 512 MB just so I wouldn't have to fuss with it). My other algorithms take so long I always ended up stopping them first.

I could use a sieve to calculate that many primes in Java or C quite fast and without using so much memory. I know I must be missing something in my Clojure/Lisp style that's causing the problem.

Is there something I'm doing really wrong? Is Clojure just kinda slow with large sequences? Reading some of the project Euler discussions people have calculated the first 10000 primes in other Lisps in under 100 miliseconds. I realize the JVM may slow things down and Clojure is relatively young, but I wouldn't expect a 100x difference.

Can someone enlighten me on a fast way to calculate prime numbers in Clojure?

+1  A: 

Here's a nice and simple implementation:

http://clj-me.blogspot.com/2008/06/primes.html

... but it is written for some pre-1.0 version of Clojure. See lazy_seqs in Clojure Contrib for one that works with the current version of the language.

Jouni K. Seppänen
Doesn't work anymore with clojure 1.2
Julien Grenier
+8  A: 

What about the following one : http://paste.lisp.org/+1HZ4

killy971
Thank you, exactly what I'm looking for. A simple little program that generates primes. It took ~350ms to make every prime under 2000000. That's the kind of performance I was expecting. Now I need to figure out why it is so much faster than my code. Thanks.
MBCook
This fails for `(sieve 10)` through `(sieve 15)` (9 is not a prime number).
Tyler