ansaurus

Question

Fastest way to list all primes below N in python

Answer 1

+18 A:

There's a pretty neat sample from the Python Cookbook here -- the fastest version proposed on that URL is:

import itertools
def erat2( ):
    D = {  }
    yield 2
    for q in itertools.islice(itertools.count(3), 0, None, 2):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            x = p + q
            while x in D or not (x&1):
                x += p
            D[x] = p

so that would give

get_primes_erat(n):
  return list(itertools.takewhile(lambda p: p<n, erat2()))

Measuring at the shell prompt (as I prefer to do) with this code in pri.py, I observe:

$ python2.5 -mtimeit -s'import pri' 'pri.get_primes(1000000)'
10 loops, best of 3: 1.69 sec per loop
$ python2.5 -mtimeit -s'import pri' 'pri.get_primes_erat(1000000)'
10 loops, best of 3: 673 msec per loop

so it looks like the Cookbook solution is over twice as fast.

Alex Martelli 2010-01-14 23:52:06

Wow! That's really fast. Thank you, super Alex! :-)

jbochi 2010-01-14 23:55:39

@jbochi, you're welcome -- but do look at that URL, including the credits: it took **ten** of us to collectively refine the code to this point, including Python-performance luminaries such as Tim Peters and Raymond Hettinger (I wrote the final text of the recipe since I edited the printed Cookbook, but in terms of coding my contribution was on a par with the others') -- in the end, it's really subtle and finely tuned code, and that's not surprising!-)

Alex Martelli 2010-01-14 23:59:13

@Alex: Knowing that your code is "only" twice as fast as mine, makes me pretty proud then. :) The URL was also very interesting to read. Thanks again.

jbochi 2010-01-15 00:13:22

And it can be made even faster with a minor change: see http://stackoverflow.com/questions/2211990/how-to-implement-an-efficient-infinite-generator-of-prime-numbers-in-python/3796442#3796442

ΤΖΩΤΖΙΟΥ 2010-09-26 03:02:36

Answer 2

A:

What size of primes do you want? There are some very fast statistical algorithms out there (eg Miller-Rabin), but their overheads mean they turn out slower unless dealing with very big (over 2^64) numbers..

Remy 2010-01-14 23:54:57

I am doing this just for fun. I'd say no more than 10 million.

jbochi 2010-01-14 23:57:30

Answer 3

A:

In general if you need fast number computation python is not the best choice. Today there are a lot of faster (and complex) algorithm. For example on my computer I got 2.2 second for your code, with Mathematica I got 0.088005.

First of all: do you need set?

wiso 2010-01-14 23:56:33

If I'm not mistaken, this makes the Numpy version faster than Mathematica. :)

EOL 2010-01-15 09:15:27

How you can say it? Have you tried the two solutions on the same machine?

wiso 2010-01-15 10:52:33

@wiso: He can use your proportions to extrapolate how long Mathematica will take on his machine. If his extrapolation is wrong, then that could mean that the two algorithms are more or less efficient than usual depending on machine setup.

Brian 2010-01-15 20:54:01

Answer 4

A:

The easiest optimization to implement is that if you want to check whether n is prime, you only have to check to see if n is divisible by a number up to square_root(n). Is this for Project Euler?

Matt 2010-01-15 00:01:05

I think you misunderstood the problem. jbochi wants to find thefirst N primes, not test an arbitrary N for primality. Trial division would probably be just about the *slowest* way of doing it -- the posted answers seem to be variations of the "Sieve of Eratosthenes" algorithm, which doesn't need to do division at all.

Jim Lewis 2010-01-20 23:19:56

Indeed I did misread the question..

Matt 2010-01-21 02:16:46

Answer 5

+9 A:

The algorithm is fast, but it has a serious flaw:

>>> sorted(get_primes(530))
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163,
167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251,
257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349,
353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443,
449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 527, 529]
>>> 17*31
527
>>> 23*23
529

You assume that numbers.pop() would return the smallest number in the set, but this is not guaranteed at all. Sets are unordered and pop() removes and returns an arbitrary element, so it cannot be used to select the next prime from the remaining numbers.

sth 2010-01-15 00:14:04

Thanks for pointing this out... I was using lists and I moved to sets to make it faster, but I didn't realize it broke the algorithm.

jbochi 2010-01-15 01:10:52

Answer 6

+28 A:

unutbu 2010-01-15 00:19:09

That's not pure Python, but it's the fastest version so far. thanks!

jbochi 2010-01-15 01:11:56

If you're interested in non-pure-Python code, then you should check out `gmpy` -- it has pretty good support for primes, via the `next_prime` method of its `mpz` type.

Alex Martelli 2010-01-15 01:41:09

just for correctness, the code example should have `import numpy as np`

Kimvais 2010-01-15 10:15:53

@Kimvais: True, thanks.

unutbu 2010-01-15 10:39:08

`int(n ** 0.5)` should be `int(math.ceil(n ** 0.5))` or `int(n ** 0.5) + 1`. `ambi_sieve(10)` gives wrong answer otherwise.

Alok 2010-01-20 03:48:47

@Alok: Very true. Thanks for catching this.

unutbu 2010-01-20 10:17:20

You might want to add wheel factorization to your comparison. The version on http://zerovolt.com/?p=88 (which happened to be the first that I found googling) is faster on my machine than the other pure Python versions you list.

stephan 2010-01-21 15:17:05

@stephan: Thank you for the link. Are you sure you would not like to throw your hat into the ring?

unutbu 2010-01-21 17:38:59

@~unutbu: go ahead if you want. To me, your comparison of various very fast algorithms seems to be the perfect place to add this specific implementation. I have used your script to compare the speed and verify correctness.

stephan 2010-01-21 21:51:51

@~unutbu: `sieveOfEratosthenes()` is faster if we allow only pure Python than any other variant (without psyco, numpy). Python3: sieveOfEratosthenes: 0.055, sieve_wheel_30: 0.061 ambi_sieve_plain: 0.130, sieveOfAtkin: 0.270; Python2: sieveOfEratosthenes: 0.061, sieve_wheel_30: 0.086, ambi_sieve_plain: 0.137, sieveOfAtkin: 0.216. If we allow C extensions then gmpy2-based variant (`0.041`) is the fastest in Python3 http://stackoverflow.com/questions/2897297/speed-up-bitstring-bit-operations-in-python/2901856#2901856 `ambi_sieve: 0.015` is the fastest on Python2 if C extensions are allowed.

J.F. Sebastian 2010-06-14 11:18:27

@J.F Sebastian: Thanks for the info. I've rechecked my timeit runs and at least on my machine (Intel E2160 running Ubuntu 9.10) `sieveOfEratosthenes` is definitely not the fastest. Do you think our differing results could be due to hardware? Also,have you checked Robert William Hanks' primesfrom3to1? http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188 Using numpy I'm getting primesfrom3to1 (19.6) versus ambi_sieve (29.4).

unutbu 2010-06-14 13:04:08

@~unutbu: Indeed `primesfrom3to1()` (corrected to return `r_[...`) takes 17 seconds for n=1e9 vs. 22.2 seconds for `ambi_sieve()` (for n=1e6 both are almost the same (15ms) `ambi_sieve()` being slightly faster). Among pure Python versions `sieveOfEratosthenes` is the fastest on my machine (Intel i7, Ubuntu 10.04)

J.F. Sebastian 2010-06-14 18:51:33

New @Robert William Hanks's `primes()` function http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188 is the fastest (for n=1e6) now among pure Python solutions http://ideone.com/weu23

J.F. Sebastian 2010-06-15 22:30:07

@J.F. Sebastian: Neat program. I love the decorators. Using your program on my machine, `primes1` is still sometimes faster than `primes` (well, on 2 out of 3 runs...). Without `psycho` the difference between those two seems to be on the order of the standard error between runs.

unutbu 2010-06-16 18:15:50

Robert William Hanks 2010-06-16 22:43:07

@Robert William Hanks: `primesfrom2to` is slightly faster than `ambi_sieve` for n=1e6 http://ideone.com/9MOfq

J.F. Sebastian 2010-06-17 12:52:13

@J.F. Sebastian: sorry to ask you that, would you mind to test it again? i made a change in the translation part of the code primesfrom2to and i am getting better results in my machine.

Robert William Hanks 2010-06-17 15:46:41

@Robert William Hanks: new version of `primesfrom2to` is definitely faster than `ambi_sieve` for sufficiently large n. http://ideone.com/ddiU2

J.F. Sebastian 2010-06-18 09:25:28

Robert William Hanks 2010-07-23 00:48:20

Answer 7

+2 A:

For the fastest code, the numpy solution is the best. For purely academic reasons, though, I'm posting my pure python version, which is a bit less than 50% faster than the cookbook version posted above. Since I make the entire list in memory, you need enough space to hold everything, but it seems to scale fairly well.

def daniel_sieve_2(maxNumber):
    """
    Given a number, returns all numbers less than or equal to
    that number which are prime.
    """
    allNumbers = range(3, maxNumber+1, 2)
    for mIndex, number in enumerate(xrange(3, maxNumber+1, 2)):
        if allNumbers[mIndex] == 0:
            continue
        # now set all multiples to 0
        for index in xrange(mIndex+number, (maxNumber-3)/2+1, number):
            allNumbers[index] = 0
    return [2] + filter(lambda n: n!=0, allNumbers)

And the results:

>>>mine = timeit.Timer("daniel_sieve_2(1000000)",
...                    "from sieves import daniel_sieve_2")
>>>prev = timeit.Timer("get_primes_erat(1000000)",
...                    "from sieves import get_primes_erat")
>>>print "Mine: {0:0.4f} ms".format(min(mine.repeat(3, 1))*1000)
Mine: 428.9446 ms
>>>print "Previous Best {0:0.4f} ms".format(min(prev.repeat(3, 1))*1000)
Previous Best 621.3581 ms

Daniel G 2010-01-15 07:41:25

Answer 8

+10 A:

For truly fastest solution with sufficiently large N would be to download a pre-calculated list of primes, store it as a tuple and do something like:

for pos,i in enumerate(primes):
    if i > N:
        print primes[:pos]

If N > primes[-1] only then calculate more primes and save the new list in your code, so next time it is equally as fast.

Always think outside the box.

Kimvais 2010-01-15 07:58:18

To be fair, though, you'd have to count the time downloading, unzipping, and formatting the primes and compare that with the time to generate primes using an algorithm - any one of these algorithms could easily write the results to a file for later use. I think in that case, given enough memory to actually calculate all primes less than 982,451,653, the numpy solution would still be faster.

Daniel G 2010-01-15 08:49:03

@Daniel correct. However the store what you have and continue whenever needed still stands...

Kimvais 2010-01-15 10:11:21

Answer 9

+8 A:

Using Sundaram's Sieve, I think I broke pure-Python's record:

def sundaram3(max_n):
    numbers = range(3, max_n+1, 2)
    half = (max_n)//2
    initial = 4

    for step in xrange(3, max_n+1, 2):
        for i in xrange(initial, half, step):
            numbers[i-1] = 0
        initial += 2*(step+1)

        if initial > half:
            return [2] + filter(None, numbers)

Comparasion:

C:\USERS>python -m timeit -n10 -s "import get_primes" "get_primes.get_primes_erat(1000000)"
10 loops, best of 3: 710 msec per loop

C:\USERS>python -m timeit -n10 -s "import get_primes" "get_primes.daniel_sieve_2(1000000)"
10 loops, best of 3: 435 msec per loop

C:\USERS>python -m timeit -n10 -s "import get_primes" "get_primes.sundaram3(1000000)"
10 loops, best of 3: 327 msec per loop

jbochi 2010-01-15 16:50:02

I've not come across Sundaram's Sieve before - very cool!

Mike Houston 2010-01-15 17:15:32

I managed to speed up your function about 20% by adding "zero = 0" at the top of the function and then replacing the lambda in your filter with "zero.__sub__". Not the prettiest code in the world, but a bit faster :)

truppo 2010-01-20 10:34:20

@truppo: Thanks for your comment! I just realized that passing `None` instead of the original function works and it's even faster than `zero.__sub__`

jbochi 2010-01-20 11:08:01

Answer 10

+3 A:

A deterministic implementation of Miller-Rabin's Primality test on the assumption that N < 9,080,191

import sys
import random

def miller_rabin_pass(a, n):
    d = n - 1
    s = 0
    while d % 2 == 0:
        d >>= 1
        s += 1

    a_to_power = pow(a, d, n)
    if a_to_power == 1:
        return True
    for i in xrange(s-1):
        if a_to_power == n - 1:
            return True
        a_to_power = (a_to_power * a_to_power) % n
    return a_to_power == n - 1


def miller_rabin(n):
    for a in [2, 3, 37, 73]:
      if not miller_rabin_pass(a, n):
        return False
    return True


n = int(sys.argv[1])
primes = [2]
for p in range(3,n,2):
  if miller_rabin(p):
    primes.append(p)
print len(primes)

According to the article on Wikipedia (http://en.wikipedia.org/wiki/Miller–Rabin_primality_test) testing N < 9,080,191 for a = 2,3,37, and 73 is enough to decide whether N is composite or not.

And I adapted the source code from the probabilistic implementation of original Miller-Rabin's test found here: http://en.literateprograms.org/Miller-Rabin_primality_test_(Python)

Amaç Herdağdelen 2010-01-21 00:17:45

Thank's for the Miller-Rabin primality test, but this code is actually slower and is not providing the correct results. 37 is prime and does not pass the test.

jbochi 2010-01-21 10:07:20

I guess 37 is one of the special cases, my bad. I was hopeful about the deterministic version though :)

Amaç Herdağdelen 2010-01-21 11:07:42

Answer 11

+2 A:

If you have control over N, the very fastest way to list all primes is to precompute them. Seriously. Precomputing is a way overlooked optimization.

Dave W. Smith 2010-01-21 00:21:40

Or download them from here http://primes.utm.edu/lists/small/millions/, but the idea is to test python's limit and see if beautiful code emerge from optimization.

jbochi 2010-01-21 10:14:15

Answer 12

A:

Here's the code I normally use to generate primes in Python:

$ python -mtimeit -s'import sieve' 'sieve.sieve(1000000)' 
10 loops, best of 3: 445 msec per loop
$ cat sieve.py
from math import sqrt

def sieve(size):
 prime=[True]*size
 rng=xrange
 limit=int(sqrt(size))

 for i in rng(3,limit+1,+2):
  if prime[i]:
   prime[i*i::+i]=[False]*len(prime[i*i::+i])

 return [2]+[i for i in rng(3,size,+2) if prime[i]]

if __name__=='__main__':
 print sieve(100)

It can't compete with the faster solutions posted here, but at least it is pure python.

Thanks for posting this question. I really learnt a lot today.

MAK 2010-01-21 16:09:36

Answer 13

A:

My guess is that the fastest of all ways is to hard code the primes in your code.

So why not just write a slow script that generates another source file that has all numbers hardwired in it, and then import that source file when you run your actual program.

Of course, this works only if you know the upper bound of N at compile time, but thus is the case for (almost) all project Euler problems.

PS: I might be wrong though iff parsing the source with hard-wired primes is slower than computing them in the first place, but as far I know Python runs from compiled .pyc files so reading a binary array with all primes up to N should be bloody fast in that case.

Adrian 2010-01-25 01:21:22

Answer 14

+13 A:

Related question(dealing with primes generators & including benchmarks):
Speed up bitstring/bit operations in Python?

Faster & more memory-wise pure Python code:

def primes(n):
    """ Returns  a list of primes < n """
    sieve = [True] * n
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i]:
            sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
    return [2] + [i for i in xrange(3,n,2) if sieve[i]]

or starting with half sieve

def primes1(n):
    """ Returns  a list of primes < n """
    sieve = [True] * (n/2)
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i/2]:
            sieve[i*i/2::i] = [False] * ((n-i*i-1)/(2*i)+1)
    return [2] + [2*i+1 for i in xrange(1,n/2) if sieve[i]]

Faster & more memory-wise numpy code:

import numpy
def primesfrom3to(n):
    """ Returns a array of primes, 3 <= p < n """
    sieve = numpy.ones(n/2, dtype=numpy.bool)
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i/2]:
            sieve[i*i/2::i] = False
    return 2*numpy.nonzero(sieve)[0][1::]+1

a faster variation starting with a third of a sieve:

import numpy
def primesfrom2to(n):
    """ Input n>=6, Returns a array of primes, 2 <= p < n """
    sieve = numpy.ones(n/3 + (n%6==2), dtype=numpy.bool)
    for i in xrange(1,int(n**0.5)/3+1):
        if sieve[i]:
            k=3*i+1|1
            sieve[       k*k/3     ::2*k] = False
            sieve[k*(k-2*(i&1)+4)/3::2*k] = False
    return numpy.r_[2,3,((3*numpy.nonzero(sieve)[0][1:]+1)|1)]

A (hard-to-code) pure-python version of the above code would be:

def primes2(n):
    """ Input n>=6, Returns a list of primes, 2 <= p < n """
    n, correction = n-n%6+6, 2-(n%6>1)
    sieve = [True] * (n/3)
    for i in xrange(1,int(n**0.5)/3+1):
      if sieve[i]:
        k=3*i+1|1
        sieve[      k*k/3      ::2*k] = [False] * ((n/6-k*k/6-1)/k+1)
        sieve[k*(k-2*(i&1)+4)/3::2*k] = [False] * ((n/6-k*(k-2*(i&1)+4)/6-1)/k+1)
    return [2,3] + [3*i+1|1 for i in xrange(1,n/3-correction) if sieve[i]]

Unfortunately pure-python don't adopt the simpler and faster numpy way of doing Assignment, and calling len() inside the loop as in [False]*len(sieve[((k*k)/3)::2*k]) is too slow. So i had to improvise to correct input (& avoid more math) and do some extreme (& painful) math-magic.
Personally i think it is a shame that numpy (which is so widely used) is not part of python standard library(2 years after python 3 release & no numpy compatibility), and that the improvements in syntax and speed seem to be completely overlooked by python developers.

Robert William Hanks 2010-06-14 05:49:40

+1: `primesfrom3to` (19.6ms) is signifcantly faster than `ambi_sieve` (29.4ms). Terrific. Thanks for sharing. PS. Since 2 is prime, would you consider changing the return value to `numpy.r_[2, 2*np.nonzero(sieve)[0][1::]+1]`?

unutbu 2010-06-14 13:11:38

Robert William Hanks 2010-06-14 19:44:54

@Robert: I added your implementations to the list compared in my answer. `primes1` and `primesfrom3to` came out on top! The rankings shown are for n=1e6. The order of the rankings did not change with n=1e7.

unutbu 2010-06-15 01:30:50

I've added `primes2` and `primesfrom2to` to the methods tested in my post. Both were significant improvements on the prior fastest. Great work!

unutbu 2010-07-23 13:07:37

I don't see any measurable speed-up for the new `primesfrom2to()` http://ideone.com/5YJkw (performance comparison is at the end of the file)

J.F. Sebastian 2010-07-25 18:03:29

@J.F. Sebastian: primes2, primesfrom2to() were only added to ~unutbu benchmarks yesterday, you already tested primesfrom2to() a while ago, so no difference to you. (any coding differences are only cosmetic ones)

Robert William Hanks 2010-07-25 21:06:11

Answer 15

A:

Sorry to bother but erat2() has a serious flaw in the algorithm.

While searching for the next composite, we need to test odd numbers only. q,p both are odd; then q+p is even and doesn't need to be tested, but q+2*p is always odd. This eliminates the "if even" test in the while loop condition and saves about 30% of the runtime.

While we're at it: instead of the elegant 'D.pop(q,None)' get and delete method use 'if q in D: p=D[q],del D[q]' which is twice as fast! At least on my machine (P3-1Ghz). So I suggest this implementation of this clever algorithm:

def erat3( ):
    from itertools import islice, count

    # q is the running integer that's checked for primeness.
    # yield 2 and no other even number thereafter
    yield 2
    D = {}
    # no need to mark D[4] as we will test odd numbers only
    for q in islice(count(3),0,None,2):
        if q in D:                  #  is composite
            p = D[q]
            del D[q]
            # q is composite. p=D[q] is the first prime that
            # divides it. Since we've reached q, we no longer
            # need it in the map, but we'll mark the next
            # multiple of its witnesses to prepare for larger
            # numbers.
            x = q + p+p        # next odd(!) multiple
            while x in D:      # skip composites
                x += p+p
            D[x] = p
        else:                  # is prime
            # q is a new prime.
            # Yield it and mark its first multiple that isn't
            # already marked in previous iterations.
            D[q*q] = q
            yield q

2010-08-13 12:04:55

Answer 16

+1 A:

A slightly different implementation of a half sieve using Numpy:

http://rebrained.com/?p=458

import math
import numpy
def prime6(upto):
    primes=numpy.arange(3,upto+1,2)
    isprime=numpy.ones((upto-1)/2,dtype=bool)
    for factor in primes[:int(math.sqrt(upto))]:
        if isprime[(factor-2)/2]: isprime[(factor*3-2)/2:(upto-1)/2:factor]=0
    return numpy.insert(primes[isprime],0,2)

Can someone compare this with the other timings? On my machine it seems pretty comparable to the other Numpy half-sieve.

nolfonzo 2010-09-03 18:37:02

`upto=10**6`: `primesfrom2to()` - 7 ms; `prime6()` - 12 ms http://ideone.com/oDg2Y

J.F. Sebastian 2010-09-04 01:57:35

Answer 17

A:

The fastest method I've tried so far is based on the Python cookbook erat2 function:

import itertools as it
def erat2a( ):
    D = {  }
    yield 2
    for q in it.islice(it.count(3), 0, None, 2):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            x = q + 2*p
            while x in D:
                x += 2*p
            D[x] = p

See this answer for an explanation of the speeding-up.

ΤΖΩΤΖΙΟΥ 2010-09-26 03:07:35

ansaurus

tags:

views:

answers:

Fastest way to list all primes below N in python

related questions