views:

455

answers:

6

This is not a homework, I am just curious.

INFINITE is the key word here.

I wish to use it as for p in primes(). I believe that this is a built-in function in Haskell.

So, the answer cannot be as naive as "Just do a Sieve".

First of all, you do not know how many consecutive primes will be consumed. Well, suppose you could concoct 100 of them at a time. Would you use the same Sieve approach as well as the frequency of prime numbers formula?

I prefer non-concurrent approach.

Thank you for reading (and writing ;) )!

+3  A: 

This isn't originally my code, however, it's worth posting. The original can be found here: http://code.activestate.com/recipes/117119/

def gen_primes():
  D = {}
  q = 2  # first integer to test for primality.

  while True:
    if q not in D:
      # not marked composite, must be prime  
      yield q 

      #first multiple of q not already marked
      D[q * q] = [q] 
    else:
      for p in D[q]:
        D.setdefault(p + q, []).append(p)
      # no longer need D[q], free memory
      del D[q]

    q += 1

It's a generator, so use it like any other.

primes = gen_primes()
for p in primes:
  print p

It takes 1.62s to generate and put into a set, 1 million primes, on my desktop.

Dominic Bou-Samra
Edited in some explanation
Dominic Bou-Samra
@Hamish: you want him to paste the first *million* primes here? Will a comment work for you? Hell, I'll do it just so you have to go through and check them.
Roger Pate
How does it scale? Please paste the first trillion primes here please.
Beska
@Beska: I'm more interested in the primes between two trillion and three trillion myself. Who would like to check em for me?
D.Shawley
10 million primes takes 19.26s. 100 million primes took 293.61s. Can someone else check - I might be doing it wrong :S
Dominic Bou-Samra
Does anyone else here get the feeling that something fishy is going on here? "Post the primes man...it's cool...I don't want any trouble...just post the primes man..."
Beska
@Hamish: why don't you just run it yourself, take the primes and look at them at your leisure? (Rather than clogging up this question/answer with an enormous amount of senseless data.)
Beska
@Beska, do you mind giving my question right before this one a shot? Thank you!
Hamish Grubijan
+1  A: 

Here's a generator that's a little truer to how it's done in Haskell: filtering against composites of known primes, then adding the remaining primes to the list.

def gen_primes():
    primes = []
    i = 2
    while True:
        prime = True
        for p in primes:
            if not (i % p):
                prime = False
                break
        if prime:
            yield i
            primes.append(i)
        i += 1
avpx
This isn't necessarily efficient, but it's much like the Haskell one-line implementation of the Sieve of Eratosthenes. It is my code, and I just wrote it, so it may not work exactly as intended, but a quick test of it *does* generate the right sequence of primes.
avpx
It did hang for me. What is the code to generate the first 100?
Hamish Grubijan
That's weird. Works fine for me. Try this: `primes = gen_primes()` and then `for i in xrange(100): print primes.next()`
avpx
+5  A: 

I still like what I wrote up here (a Cookbook recipe with many other authors) -- it shows how a Sieve of Eratosthenes has no intrinsic limits, and the comments and discussion, I believe, make it quite clear. This was recently discussed on Stack Overflow (search for the authors' names, I guess) and somebody proposed a substantially faster (but IMHO less clear) version;-).

Alex Martelli
Neato..........................
Hamish Grubijan
+2  A: 

Do a sliced sieve, where the size of a slice is determined by available memory or the maximal size of a bitset.

For each slice represent the numbers in some interval [n; n + slice_size) as a bit set and sieve with all prime numbers below the square root of the upper bound.

Using a bit set uses less memory than a hash table or tree data structure, because you are working with dense sets of numbers.

starblue
Got an implementation?
Hamish Grubijan
+1  A: 

I wrote an article about an infinite primes generator some times ago:

http://stacktrace.it/2008/01/progetto-eulero-problema-3/

It's in Italian but you may have a pesky translation using Google: http://tinyurl.com/yzpyeom

piro
+2  A: 

“If I have seen further…”

The erat2 function from the cookbook can be further sped up (by about 20-25%):

erat2a

import itertools as it
def erat2a( ):
    D = {  }
    yield 2
    for q in it.islice(it.count(3), 0, None, 2):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            # old code here:
            # x = p + q
            # while x in D or not (x&1):
            #     x += p
            # changed into:
            x = q + 2*p
            while x in D:
                x += 2*p
            D[x] = p

The not (x&1) check verifies that x is odd. However, since both q and p are odd, by adding 2*p half of the steps are avoided along with the test for oddity.

erat3

If one doesn't mind a little extra fanciness, erat2 can be sped up by 35-40% with the following changes (NB: needs Python 2.7+ or Python 3+ because of the itertools.compress function):

import itertools as it
def erat3( ):
    D = { 9: 3, 25: 5 }
    yield 2
    yield 3
    yield 5
    MASK= 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0,
    MODULOS= frozenset( (1, 7, 11, 13, 17, 19, 23, 29) )

    for q in it.compress(
            it.islice(it.count(7), 0, None, 2),
            it.cycle(MASK)):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            x = q + 2*p
            while x in D or (x%30) not in MODULOS:
                x += 2*p
            D[x] = p

The erat3 function takes advantage of the fact that all primes (except for 2, 3, 5) modulo 30 result to only eight numbers: the ones included in the MODULOS frozenset. Thus, after yielding the initial three primes, we start from 7 and work only with the candidates.
The candidate filtering uses the itertools.compress function; the “magic” is in the MASK sequence; MASK has 15 elements (there are 15 odd numbers in every 30 numbers, as chosen by the itertools.islice function) with a 1 for every possible candidate, starting from 7. The cycle repeats as specified by the itertools.cycle function.
The introduction of the candidate filtering needs another modification: the or (x%30) not in MODULOS check. The erat2 algorithm processed all odd numbers; now that the erat3 algorithm processes only r30 candidates, we need to make sure that all D.keys() can only be such —false— candidates.

Benchmarks

Results

On an Atom 330 Ubuntu 9.10 server, versions 2.6.4 and 3.1.1+:

$ testit
up to 8192
==== python2 erat2 ====
100 loops, best of 3: 18.6 msec per loop
==== python2 erat2a ====
100 loops, best of 3: 14.5 msec per loop
==== python2 erat3 ====
Traceback (most recent call last):
…
AttributeError: 'module' object has no attribute 'compress'
==== python3 erat2 ====
100 loops, best of 3: 19.2 msec per loop
==== python3 erat2a ====
100 loops, best of 3: 14.1 msec per loop
==== python3 erat3 ====
100 loops, best of 3: 11.7 msec per loop

On an AMD Geode LX Gentoo home server, Python 2.6.5 and 3.1.2:

$ testit
up to 8192
==== python2 erat2 ====
10 loops, best of 3: 104 msec per loop
==== python2 erat2a ====
10 loops, best of 3: 81 msec per loop
==== python2 erat3 ====
Traceback (most recent call last):
…
AttributeError: 'module' object has no attribute 'compress'
==== python3 erat2 ====
10 loops, best of 3: 116 msec per loop
==== python3 erat2a ====
10 loops, best of 3: 82 msec per loop
==== python3 erat3 ====
10 loops, best of 3: 66 msec per loop

Benchmark code

A primegen.py module contains the erat2, erat2a and erat3 functions. Here follows the testing script:

#!/bin/sh
max_num=${1:-8192}
echo up to $max_num
for python_version in python2 python3
do
    for function in erat2 erat2a erat3
    do
        echo "==== $python_version $function ===="
        $python_version -O -m timeit -c \
        -s  "import itertools as it, functools as ft, operator as op, primegen; cmp= ft.partial(op.ge, $max_num)" \
            "next(it.dropwhile(cmp, primegen.$function()))"
    done
done
ΤΖΩΤΖΙΟΥ
This is an impressive, albeit belated answer. I would encourage others to up-vote as well.
Hamish Grubijan
Thanks; I usually catch up from the RSS feed, and I see questions about 3-4 weeks later :)
ΤΖΩΤΖΙΟΥ