views:

1098

answers:

9

I wrote a prime number generator using Sieve of Eratosthenes and Python 3.1. The code runs correctly and gracefully at 0.32 seconds on ideone.com to generate prime numbers up to 1,000,000.

# from bitstring import BitString

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...    
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5) 
    flags = [False, False] + [True] * (limit - 2)   
#     flags = BitString(limit)
    # Step through all the odd numbers
    for i in range(3, limit, 2):       
        if flags[i] is False:
#        if flags[i] is True:
            continue
        yield i
        # Exclude further multiples of the current prime number
        if i <= sub_limit:
            for j in range(i*3, limit, i<<1):
                flags[j] = False
#                flags[j] = True

The problem is, I run out of memory when I try to generate numbers up to 1,000,000,000.

    flags = [False, False] + [True] * (limit - 2)   
MemoryError

As you can imagine, allocating 1 billion boolean values (1 byte 4 or 8 bytes (see comment) each in Python) is really not feasible, so I looked into bitstring. I figured, using 1 bit for each flag would be much more memory-efficient. However, the program's performance dropped drastically - 24 seconds runtime, for prime number up to 1,000,000. This is probably due to the internal implementation of bitstring.

You can comment/uncomment the three lines to see what I changed to use BitString, as the code snippet above.

My question is, is there a way to speed up my program, with or without bitstring?

Edit: Please test the code yourself before posting. I can't accept answers that run slower than my existing code, naturally.

Edit again:

I've compiled a list of benchmarks on my machine.

+2  A: 

Python's integer types can be of arbitrary size, so you shouldn't need a clever bitstring library to represent a set of bits, just a single number.

Here's the code. It handles a limit of 1,000,000 with ease, and can even handle 10,000,000 without complaining too much:

def multiples_of(n, step, limit):
    bits = 1 << n
    old_bits = bits
    max = 1 << limit
    while old_bits < max:
        old_bits = bits
        bits += bits << step
        step *= 2
    return old_bits

def prime_numbers(limit=10000000):
    '''Prime number generator. Yields the series                                                                        
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...                                                                              
    using Sieve of Eratosthenes.                                                                                        
    '''
    yield 2
    sub_limit = int(limit**0.5)
    flags = ((1 << (limit - 2)) - 1) << 2
    # Step through all the odd numbers                                                                                  
    for i in xrange(3, limit, 2):
        if not (flags & (1 << i)):
            continue
        yield i
        # Exclude further multiples of the current prime number                                                         
        if i <= sub_limit:
            flags &= ~multiples_of(i * 3, i << 1, limit)

The multiples_of function avoids the cost of setting every single multiple individually. It sets the initial bit, shifts it by the initial step (i << 1) and adds the result to itself. It then doubles the step, so that the next shift copies both bits, and so on until it reaches the limit. This sets all the multiples of a number up to the limit in O(log(limit)) time.

Marcelo Cantos
@Marcelo: Yeah, I'm aware of that. I could also write a few functions that do the bit manipulations and see if it's much faster. It's something I'm working on, at this moment. Edit: The issue is, even performing `2 << 1000000` takes more than 10 seconds.
Xavier Ho
@Xavier: No it doesn't. It might be that printing the result of that operation takes that long. Try `x = 2 << 1000000` instead.
kaizer.se
@kaizer.se: Hm, good point. I'll go and see what I can do with that.
Xavier Ho
This seems unlikely to work well, unless you can find some way of doing a whole bunch of bit-sets together: each bit-operation on a long will generate a whole new long, so is an O(n) operation.
Mark Dickinson
@Mark: Yes, I just tried it. 1.3 seconds to generate up to 100,000, and when I tried 1,000,000 it takes more than 20 seconds (in fact, it's still not finished as I type this). This is not viable by bit shifting, and I need direct access to change the bits.
Xavier Ho
@Marcelo: Thanks for the edit. Why did you change the square root to division by 2, by the way? I'll try your code now. || Edit: `0.992000102997 seconds` for n < 100,000. For n < 1,000,000 it still takes ages. Have you tried it yoruself? Nice work, though. Gave me some ideas.
Xavier Ho
That was a silly mistake. I mistook `**` for `*`.
Marcelo Cantos
@Marcelo: Changed it to sqrt, yielding 0.59 seconds for n < 100,000. Still not good enough, sorry. `:]` || Edit: 1 million took `54.7890000343 seconds`.
Xavier Ho
A slowdown is inevitable as the limit grows. Just out of curiosity: is the bitset version faster or slower? What is the fastest time you've seen for 1,000,000?
Marcelo Cantos
@Marcelo: What bitset version? If you meant Scott's answer, yes, it took 11 seconds in that version, for n < 1,000,000. That's still 10 times slower than my code, though.
Xavier Ho
@Marcelo: I wonder whether you can create the values for `multiples_of` with some clever divisions. E.g., `(1<<(2*n-1))//3` has every other bit set; `(1<<(3*n-1))//7` has every 3rd bit set, etc.
Mark Dickinson
@Marcelo: The fastest time for 1,000,000 in Python? My version, currently (that I've seen). In C/C++? I recall code that ran in 100x the speed, somewhere on ProjectEuler. I'll check.
Xavier Ho
scratch that last comment; that scheme quickly get out of hand when the quotient of the division becomes too large (i.e., multi-limb, internally).
Mark Dickinson
Sorry, I can't get it any faster (I didn't notice the times you posted in your question; sorry). I think bitstring wins because it lets you mutate the one bitmap, whereas the integer arithmetic does a lot of copying.
Marcelo Cantos
@Marcelo: Yeah, but neither of them are faster than a list of booleans. I'm sad. `=/` || Although, my primary issue is that I run out of memory at n < 1 billion.
Xavier Ho
@Marcelo: I foudn a C# code here: http://ideone.com/1GMyI || It claims to run at 190ms and finds the sum of all prime numbers under 1 million. I haven't tested it but I believe it - it's the exact same algorithm as mine.
Xavier Ho
Certainly, compiled languages will solve this much faster than Python can. You might have the best of both worlds if there was a bitstring library written in C (and, ideally, supported setting a stripe of bits).
Marcelo Cantos
@Marcelo: Yeah. But scroll down ad look at Mark's answer. `=]`
Xavier Ho
+3  A: 

One speed improvement you can make using bitstring is to use the 'set' method when you exclude multiples of the current number.

So the vital section becomes

for i in range(3, limit, 2):
    if flags[i]:
        yield i
        if i <= sub_limit:
            flags.set(1, range(i*3, limit, i*2))    

On my machine this runs about 3 times faster than the original.

My other thought was to use the bitstring to represent only the odd numbers. You could then find the unset bits using flags.findall([0]) which returns a generator. Not sure if that would be much faster (finding bit patterns isn't terribly easy to do efficiently).

[Full disclosure: I wrote the bitstring module, so I've got some pride at stake here!]


As a comparison I've also taken the guts out of the bitstring method so that it's doing it in the same way, but without range checking etc. I think this gives a reasonable lower limit for pure Python that works for a billion elements (without changing the algorithm, which I think is cheating!)

def prime_pure(limit=1000000):
    yield 2
    flags = bytearray((limit + 7) // 8)
    sub_limit = int(limit**0.5)
    for i in xrange(3, limit, 2):
        byte, bit = divmod(i, 8)
        if not flags[byte] & (128 >> bit):
            yield i
            if i <= sub_limit:
                for j in xrange(i*3, limit, i*2):
                    byte, bit = divmod(j, 8)
                    flags[byte] |= (128 >> bit)

On my machine this runs in about 0.62 seconds for a million elements, which means it's about a quarter of the speed of the bitarray answer. I think that's quite reasonable for pure Python.

Scott Griffiths
@Scott: Ah cool, nice to have the author of bitstring here! I'll try that too, will let you know shortly of the result. And yes, I am using 2.0.0 beta 1, as I only downloaded the library an hour ago.
Xavier Ho
@Scott: Just did a test. 11.2 seconds with your fix. Still a bit slow. Got any more ideas?
Xavier Ho
Couple more ideas above. I'm guessing that should bring your time down to about 7 seconds.
Scott Griffiths
@Scott: Thanks. But my original code runs at 0.32 seconds. See: http://ideone.com/wCJm5. But thanks for the idea and input! I'll be watching this topic for a while.
Xavier Ho
I thought the challenge was for the fastest sieve code to produce primes up to a billion, not a million. The bitstring code will work for a billion, whereas your original code doesn't, so I think bitstring is winning so far! As an aside, using Python 2 I get the million case down to 4.5 seconds.
Scott Griffiths
@Scott: Yes, the memory constraint is something even Mark's answer can't solve. However, even 4.5 seconds is too slow compared to my original code. It does solve the big number issue, but the time also blows up in a linear fashion. Hence the reason I accepted Mark's answer (0.11 seconds up to 1 million.)
Xavier Ho
@Scott: I wonder if we can combine bitstring with the wheel..
Xavier Ho
Using the wheel feels like cheating to me - it's not quite the same algorithm any more. By stripping away some internal bitstring validation code I got down to 2.3 seconds for primes up to a million. I'll tell you how long it takes for a billion as soon as it finishes :)
Scott Griffiths
Primes up to a billion in a little under 38 minutes. But at least it worked!
Scott Griffiths
@Scott: Nice! That's really good to know. I didn't have the patience to wait =/
Xavier Ho
@Scott: Thanks for the side to side comparison. Yes, you did a good job. `:3`.
Xavier Ho
`prime_pure()` is two times slower than the slowest function I've compared http://ideone.com/F0RBo (comparison is at the end of the file)
J.F. Sebastian
+2  A: 

One option you may want to look at is just compiling a C/C++ module so you have direct access to the bit-twiddling features. AFAIK you could write something of that nature and only return the results on completion of the calculations performed in C/C++. Now that I'm typing this you may also look at numpy which does store arrays as native C for speed. I don't know if that will be any faster than the bitstring module, though!

Wayne Werner
Thanks, Wayne. You're right that it is an option - just not exactly an easy one. I'll be happy when you write one, of course. `;]`
Xavier Ho
Heh. It's actually not that bad if you already know C/C++ - the biggest pain is figuring out how to tell your scripts the right thing for scons to take care of the build process. Then you just have to deal with ~ 125 MB worth of bits (1 Billion bits/8 bytes == 125 Million Bytes).
Wayne Werner
@Wayne: True. I know C++, but I know Python is written in C, and I haven't written a Python module in C/C++ myself. So that is a little distant yet. It's okay, though, we're getting some brilliant answers here on SO, as always. `:]`
Xavier Ho
+3  A: 

Here's a version that I wrote a while back; it might be interesting to compare with yours for speed. It doesn't do anything about the space problems, though (in fact, they're probably worse than with your version).

from math import sqrt

def basicSieve(n):
    """Given a positive integer n, generate the primes < n."""
    s = [1]*n
    for p in xrange(2, 1+int(sqrt(n-1))):
        if s[p]:
            a = p*p
            s[a::p] = [0] * -((a-n)//p)
    for p in xrange(2, n):
        if s[p]:
            yield p 

I have faster versions, using a wheel, but they're much more complicated. Original source is here.

Okay, here's the version using a wheel. wheelSieve is the main entry point.

from math import sqrt
from bisect import bisect_left

def basicSieve(n):
    """Given a positive integer n, generate the primes < n."""
    s = [1]*n
    for p in xrange(2, 1+int(sqrt(n-1))):
        if s[p]:
            a = p*p
            s[a::p] = [0] * -((a-n)//p)
    for p in xrange(2, n):
        if s[p]:
            yield p

class Wheel(object):
    """Class representing a wheel.

    Attributes:
       primelimit -> wheel covers primes < primelimit.
       For example, given a primelimit of 6
       the wheel primes are 2, 3, and 5.
       primes -> list of primes less than primelimit
       size -> product of the primes in primes;  the modulus of the wheel
       units -> list of units modulo size
       phi -> number of units

    """
    def __init__(self, primelimit):
        self.primelimit = primelimit
        self.primes = list(basicSieve(primelimit))

        # compute the size of the wheel
        size = 1
        for p in self.primes:
            size *= p
        self.size = size

        # find the units by sieving
        units = [1] * self.size
        for p in self.primes:
            units[::p] = [0]*(self.size//p)
        self.units = [i for i in xrange(self.size) if units[i]]

        # number of units
        self.phi = len(self.units)

    def to_index(self, n):
        """Compute alpha(n), where alpha is an order preserving map
        from the set of units modulo self.size to the nonnegative integers.

        If n is not a unit, the index of the first unit greater than n
        is given."""
        return bisect_left(self.units, n % self.size) + n // self.size * self.phi

    def from_index(self, i):
        """Inverse of to_index."""

        return self.units[i % self.phi] + i // self.phi * self.size

def wheelSieveInner(n, wheel):
    """Given a positive integer n and a wheel, find the wheel indices of
    all primes that are less than n, and that are units modulo the
    wheel modulus.
    """

    # renaming to avoid the overhead of attribute lookups
    U = wheel.units
    wS = wheel.size
    # inverse of unit map
    UI = dict((u, i) for i, u in enumerate(U))
    nU = len(wheel.units)

    sqroot = 1+int(sqrt(n-1)) # ceiling of square root of n

    # corresponding index (index of next unit up)
    sqrti = bisect_left(U, sqroot % wS) + sqroot//wS*nU
    upper = bisect_left(U, n % wS) + n//wS*nU
    ind2 = bisect_left(U, 2 % wS) + 2//wS*nU

    s = [1]*upper
    for i in xrange(ind2, sqrti):
        if s[i]:
            q = i//nU
            u = U[i%nU]
            p = q*wS+u
            u2 = u*u
            aq, au = (p+u)*q+u2//wS, u2%wS
            wp = p * nU
            for v in U:
                # eliminate entries corresponding to integers
                # congruent to p*v modulo p*wS
                uvr = u*v%wS
                m = aq + (au > uvr)
                bot = (m + (q*v + u*v//wS - m) % p) * nU + UI[uvr]
                s[bot::wp] = [0]*-((bot-upper)//wp)
    return s

def wheelSieve(n, wheel=Wheel(10)):
    """Given a positive integer n, generate the list of primes <= n."""
    n += 1
    wS = wheel.size
    U = wheel.units
    nU = len(wheel.units)
    s = wheelSieveInner(n, wheel)
    return (wheel.primes[:bisect_left(wheel.primes, n)] +
            [p//nU*wS + U[p%nU] for p in xrange(bisect_left(U, 2 % wS)
             + 2//wS*nU, len(s)) if s[p]])

As to what a wheel is: well, you know that (apart from 2), all primes are odd, so most sieves miss out all the even numbers. Similarly, you can go a bit further and notice that all primes (except 2 and 3) are congruent to 1 or 5 modulo 6 (== 2 * 3), so you can get away with only storing entries for those numbers in your sieve. The next step up is to note that all primes (except 2, 3 and 5) are congruent to one of 1, 7, 11, 13, 17, 19, 23, 29 (modulo 30) (here 30 == 2*3*5), and so on.

Mark Dickinson
Care to explain the whee? Is it something similar to ... Sieve of Akin?
Xavier Ho
@Mark: 0.28 seconds! You're our first-up to the finals! =D http://ideone.com/yIENn
Xavier Ho
@Mark: And.. naturally, `MemoryError` @ 1,000,000,000. =/ http://ideone.com/YeBOR. I'm curious about your faster version now.
Xavier Ho
Thanks! I'll read over the code and try to understand it. Will report my status later.
Xavier Ho
No, Sieve of Atkin introduces a fundamentally new idea, which is to use quadratic forms; I think it's only *asymptotically* faster than a basic sieve of eratosthenes + wheel, and the point at which it becomes faster is likely to be > 1000000 (depending on implementations, of course). The idea of using a wheel has been around a good while. I've added a link to the place I first posted this; there's an implementation using a wheel there.
Mark Dickinson
@Mark: Ah I see.
Xavier Ho
@Mark: Nice! The wheel code you wrote a while back took 0.127 seconds. http://ideone.com/qX2cU. It still doesn't handle 1,000,000,000, but I think I've probed enough. I will definitely look into this interesting beast. Accepted!
Xavier Ho
@Mark: I just wanted to let you know that your slicing assignment is a great idea: I was able to optimise my code to 0.21 seconds from 0.32 seconds: http://ideone.com/bIDY4 - 1/2 the speed of your wheel code. `:]`
Xavier Ho
+7  A: 

OK, so this is my second answer, but as speed is of the essence I thought that I had to mention the bitarray module - even though it's bitstring's nemesis :). It's ideally suited to this case as not only is it a C extension (and so faster than pure Python has a hope of being), but it also supports slice assignments. It's not yet available for Python 3 though.

I haven't even tried to optimise this, I just rewrote the bitstring version. On my machine I get 0.16 seconds for primes under a million.

For a billion, it runs perfectly well and completes in 2 minutes 31 seconds.

import bitarray

def prime_bitarray(limit=1000000):
    yield 2
    flags = bitarray.bitarray(limit)
    flags.setall(False)
    sub_limit = int(limit**0.5)
    for i in xrange(3, limit, 2):
        if not flags[i]:
            yield i
            if i <= sub_limit:
                flags[3*i:limit:i*2] = True
Scott Griffiths
@Scott: Aww what, bit array! Exactly what I needed? XD. I'll give it a try after work today.
Xavier Ho
Yes, I've run into the same problem before and was going to suggest bitarray. I hadn't heard of bitstring before, but I'll check it out. I had been using BitVector before I found bitarray (which I found wasn't very optimized), but its good to know of another binary array module to check out.
Justin Peel
@Scott: Just thought to let you know that on my machine, it took 0.45 seconds to run for n < 1,000,000, and so it will probably take 450 seconds to do a billion. I haven't tried it yet, but just to give you some idea about my machine speed compared to my 0.21 seconds version. Perhaps I can use bitarray for a generator that requires more than 100 million or something, heh.
Xavier Ho
+6  A: 

There are a couple of small optimizations for your version. By reversing the roles of True and False, you can change "if flags[i] is False:" to "if flags[i]:". And the starting value for the second range statement can be i*i instead of i*3. Your original version takes 0.166 seconds on my system. With those changes, the version below takes 0.156 seconds on my system.

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    flags = [True, True] + [False] * (limit - 2)
    # Step through all the odd numbers
    for i in range(3, limit, 2):
        if flags[i]:
            continue
        yield i
        # Exclude further multiples of the current prime number
        if i <= sub_limit:
            for j in range(i*i, limit, i<<1):
                flags[j] = True

This doesn't help your memory issue, though.

Moving into the world of C extensions, I used the development version of gmpy. (Disclaimer: I'm one of the maintainers.) The development version is called gmpy2 and supports mutable integers called xmpz. Using gmpy2 and the following code, I have a running time of 0.140 seconds. Running time for a limit of 1,000,000,000 is 158 seconds.

import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    # Actual number is 2*bit_position + 1.
    oddnums = gmpy2.xmpz(1)
    current = 0
    while True:
        current += 1
        current = oddnums.bit_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        # Exclude further multiples of the current prime number
        if prime <= sub_limit:
            for j in range(2*current*(current+1), limit>>1, prime):
                oddnums.bit_set(j)

Pushing optimizations, and sacrificing clarity, I get running times of 0.107 and 123 seconds with the following code:

import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    # Actual number is 2*bit_position + 1.
    oddnums = gmpy2.xmpz(1)
    f_set = oddnums.bit_set
    f_scan0 = oddnums.bit_scan0
    current = 0
    while True:
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        # Exclude further multiples of the current prime number
        if prime <= sub_limit:
            list(map(f_set,range(2*current*(current+1), limit>>1, prime)))

Edit: Based on this exercise, I modified gmpy2 to accept xmpz.bit_set(iterator). Using the following code, the run time for all primes less 1,000,000,000 is 56 seconds for Python 2.7 and 74 seconds for Python 3.2. (As noted in the comments, xrange is faster than range.)

import gmpy2

try:
    range = xrange
except NameError:
    pass

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    oddnums = gmpy2.xmpz(1)
    f_scan0 = oddnums.bit_scan0
    current = 0
    while True:
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        if prime <= sub_limit:
            oddnums.bit_set(iter(range(2*current*(current+1), limit>>1, prime)))

Edit #2: One more try! I modified gmpy2 to accept xmpz.bit_set(slice). Using the following code, the run time for all primes less 1,000,000,000 is about 40 seconds for both Python 2.7 and Python 3.2.

from __future__ import print_function
import time
import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    flags = gmpy2.xmpz(1)
    # pre-allocate the total length
    flags.bit_set((limit>>1)+1)
    f_scan0 = flags.bit_scan0
    current = 0
    while True:
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        if prime <= sub_limit:
            flags.bit_set(slice(2*current*(current+1), limit>>1, prime))

start = time.time()
result = list(prime_numbers(1000000000))
print(time.time() - start)

Edit #3: I've updated gmpy2 to properly support slicing at the bit level of an xmpz. No change in performance but a much nice API. I have done a little tweaking and I've got the time down to about 37 seconds.

from __future__ import print_function
import time
import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    sub_limit = int(limit**0.5)
    flags = gmpy2.xmpz(1)
    flags[(limit>>1)+1] = True
    f_scan0 = flags.bit_scan0
    current = 0
    prime = 2
    while prime <= sub_limit:
        yield prime
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        flags[2*current*(current+1):limit>>1:prime] = True
    while prime <= limit:
        yield prime
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1

start = time.time()
result = list(prime_numbers(1000000000))
print(time.time() - start)
casevh
What settings have you used to install `gmpy`. It takes 500 seconds for limit=1e9 on my machine (for comparison, `primes_upto2()` from http://rosettacode.org/wiki/Sieve_of_Eratosthenes#Using_numpy takes me 30 seconds). I've just checkouted the code and run `python setup.py install`
J.F. Sebastian
@casevh: Thanks for the gmpy code. I'll give it a run after work today - I'm pretty impressed by the 2/3 reduced runtime.
Xavier Ho
That should be all you need to do to install it. I'm using 64-bit Ubuntu 10.04, 2.2Ghz Core2 Duo, GMP 5.01, and Python 3.1. Using the Ubuntu version of numpy, primes_upto2() takes 50 seconds on my computer so something is strange.
casevh
@casevh: I've noticed the 2nd variant of `prime_numbers()` causes my machine to swap. So it might ruin the benchmark. 64-bit Ubuntu 10.04, 2.6GHz i7, GMP 4.3.2 pgmpy 2.0.0a and python 2.6.4.
J.F. Sebastian
Since you are using Python 2.x, change range to xrange. That should fix the extreme memory usage. You might try compiling GMP 5.01 from source. IIRC, it has improved code for the newer Intel processors.
casevh
This answer has been the best so far; I'm giving gmpy a shot now. Would everyone be okay if I put in a comparison list of execution speed on my machine? I think other people might have trouble following this topic.
Xavier Ho
@casevh: I tried to checkout the development build 1.12, but I'm getting the following error when I tried to use `setup.py install` : `c:\users\hp\documents\coding\python\gmpy_1_12\src\gmpy.h(31) : fatal error C1083: Cannot open include file: 'gmp.h': No such file or directory`
Xavier Ho
gmpy is a wrapper for the GNU Multiple Precision library so that needs to be compiled. There are two files included that describe the Windows build process. You will also need to checkout the trunk version to get gmpy2. It will be identified at 2.0.0a. 1.12 is just a bug fix that will be released soon. For more assistance with building gmpy2, open an issue at the gmpy site.
casevh
@casevh: I've redone the benchmark: prime_numbers: `169.00` s and prime_numbers2: `95.53` s for `1e9`.
J.F. Sebastian
@casevh: gmp 5.0.1 produces the same results: prime_numbers: `166.93` prime_numbers2: `96.48` (I've double checked the version with `gmpy2.gmp_version()`).
J.F. Sebastian
@casevh: Python 3.2a0 is slightly slower: prime_numbers: `173.18` prime_numbers2: `110.99`
J.F. Sebastian
@casevh: I see, thanks for the explanation. Will try soon.
Xavier Ho
@J.F. Sebastian: my results agree with you on Py3 vs Py2. Even `xrange()` in Python 2.6 is faster than `range` in Python 3. I wonder why.
Xavier Ho
@Xavier Ho: xrange() uses C ints (fast, finite). range() uses Python ints (slow, infinite (while the memory is available)).
J.F. Sebastian
J.F. Sebastian
@J.F. Sebastian: Interesting insight. So did they remove xrange() in Python 3 completely, or was it moved somewhere?
Xavier Ho
@J.F. Sebastian: Your performance results with Python 3.2a0 are interesting. I usually get the same, or slightly faster, times with 3.2. To ensure that the configure options are the same, I compile each version of Python that I use for testing. Did you use --with-computed-gotos when you compiled 3.2? IIRC, it is disabled by default.
casevh
@Xavier Ho: Actually my explanation could be wrong. According to `rangeobject.c` `rangeiter_next()` is used for small numbers both by xrange and range. The only difference that I see is type conversions (unsigned/signed in Python 3.x). The time difference might be due to the way I've tested: `python -mtimeit 'for _ in xrange(50*10**6): pass'
J.F. Sebastian
@casevh: I've used default settings. `--with-computed-gotos` on 3.2a0 puts the result in between 2.6.4 and 3.2a0 for a default: prime_numbers: `167.97` prime_numbers2: `105.05`
J.F. Sebastian
@casevh: To sum up: On my machine there is no measurable difference in performance between gmp 4.3.2/5.0.1, Python 2.6.4/3.2a0, /--with-computed-gotos for the prime_numbers() functions.
J.F. Sebastian
@J.F. Sebastian: I've edited my original answer to to include a new result based on changes I made to gmpy2. xmpz.bit_set will now accept an iterator directly. Time is now 56 seconds with Python 2.7.
casevh
@casevh: with new xmpz.bit_set(): Python 2.6.4: `47`, Python 3.x `66` seconds. It is close to 55 seconds for http://ideone.com/mxSdH but still slower than 30 seconds for primes_upto2() from http://rosettacode.org/wiki/Sieve_of_Eratosthenes#Using_numpy
J.F. Sebastian
@J.F. Sebastian: I added slice support to xmpz.bit_set. Down to about 40 seconds on my system.
casevh
@casevh: slice-based version is 33 seconds on my machine (it is near 30 seconds for the numpy version). btw, the interface for xmpz.bit_set() is a mess (take a look at list_ass_subscript() (implements L[o]=v) in listobject.c, tuplesubscript() (implements t[o]) in tupleobject.c). Additionally immutable case should have the same capabilities (just use `result` in immutable case everywhere you use `self` in mutable case).
J.F. Sebastian
@J.F. Sebastian: Thanks for continuing to benchmark each version. I agree the slice interface is horrible. It started out as an experiment to see how much overhead was involved in iterating over xrange/range. I was originally thinking of an xrange clone that gmpy2 would recognize and then bypass the iterator protocol. It would be easy to allow xmpz[::]=1 to set bits but is worth supporting a=a[::-1] and a[::2]=a[1::2] etc.? Suggestions welcome :)
casevh
Wow. The world is just getting better and better. Nice work, casevh. I should be getting around to install gmp on my computer next week, and I'll let you know the benchmark results.
Xavier Ho
@casevh: I don't know what are design priorities for the gmpy2: maintanability (hence robustness), performance or feature-set (reason to bother with the C dependency). For bit_set() I'd leave Integer index and slice. Iterable version is slow and complicates both code and interface. gmpy_mpz_bit_set_iterable() could be removed completely http://gist.github.com/416936
J.F. Sebastian
@J.F. Sebastian: The goals are speed and full support of GMP. The goal for gmpy 1.1x was support for Python 3.x and speed but I didn't want to change the API much. With gmpy2, I'm dropping support for old versions of Python (pre 2.6) and I'm willing to change the API; hence the experiments. Thanks for the code!
casevh
@J.F. Sebastian: I've updated gmpy2 to (hopefully) properly support slicing. It's only supported on xmpz for the moment. I made another tweak to prime_numbers() and I'm down to around 37 seconds.
casevh
Timing for the edit #3: Python 2.6: 24 seconds 3.2: 33 seconds whichoutperform original numpy versionhttp://rosettacode.org/wiki/Sieve_of_Eratosthenes#Using_numpy. Themapping interface for integers might be too much magic for me. I'll open an issue with encountered problems.
J.F. Sebastian
+5  A: 

Okay, here's a (near complete) comprehensive benchmarking I've done tonight to see which code runs the fastest. Hopefully someone will find this list useful. I omitted anything that takes more than 30 seconds to complete on my machine.

I would like to thank everyone that put in an input. I've gained a lot of insight from your efforts, and I hope you have too.

My machine: AMD ZM-86, 2.40 Ghz Dual-Core, with 4GB of RAM. This is a HP Touchsmart Tx2 laptop. Note that while I may have linked to a pastebin, I benchmarked all of the following on my own machine.

I will add the gmpy2 benchmark once I am able to build it.

All of the benchmarks are tested in Python 2.6 x86

Returning a list of prime numbers n up to 1,000,000: (Using Python generators)

Sebastian's numpy generator version (updated) - 121 ms @

Mark's Sieve + Wheel - 154 ms

Robert's version with slicing - 159 ms

My improved version with slicing - 205 ms

Numpy generator with enumerate - 249 ms @

Mark's Basic Sieve - 317 ms

casevh's improvement on my original solution - 343 ms

My modified numpy generator solution - 407 ms

My original method in the question - 409 ms

Bitarray Solution - 414 ms @

Pure Python with bytearray - 1394 ms @

Scott's BitString solution - 6659 ms @

'@' means this method is capable of generating up to n < 1,000,000,000 on my machine setup.

In addition, if you don't need the generator and just want the whole list at once:

numpy solution from RosettaCode - 32 ms @

(The numpy solution is also capable of generating up to 1 billion, which took 61.6259 seconds. I suspect the memory was swapped once, hence the double time.)

Xavier Ho
@Xavier Ho: you numpy version is incorrect: the step should be `n`, not `n*n` i.e., `isprime[n*n::n]=0`. You don't need to call `numpy.nonzero()` if you'd like a generator version: `primes[:2]=0; return (i for i, p in enumerate(primes) if p)`
J.F. Sebastian
Note: `numpy` generator solution is 3 times slower (100 seconds vs. 30 seconds for 1e9) than the non-generator version.
J.F. Sebastian
@J.F Sebastian: Nice catch. Thought I had fixed it! Thanks.
Xavier Ho
@J.F. Sebastian: Interesting. On my machine it's more than 6 times slower.
Xavier Ho
@Xavier Ho: Here's numpy generator version http://ideone.com/mxSdH (55 seconds for 1e9 (compared to 30 seconds for numpy non-generator version and to 45 seconds for @casevh's new xmpz.bitset()-based version)
J.F. Sebastian
Nice. I'll give it a go tonight. Thanks.
Xavier Ho
I've replaced `isprime[n*n::n]` by `isprime[n*n::2*n]` due to this solution already skips all even numbers http://ideone.com/G9l44 It is slightly faster `52.72` vs. `47.84` seconds for limit `1e9`.
J.F. Sebastian
@J.F. Sebastian: Haha, good find. I've updated the chart above to reflect the new benchmark.
Xavier Ho
`ambi_sieve()` is faster than numpy solution from RossettaCode http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/2068548#2068548
J.F. Sebastian
Madness. This is madness! (I've been pretty busy. Did you run some tests on your own machine?)
Xavier Ho
@Xavier Ho: is my new generator any good?
Robert William Hanks
@Robert William Hanks: `prime_numbers4()` is a champion among generator solutions (on my machine and for the code: http://ideone.com/weu23 )
J.F. Sebastian
`primesfrom2to()` takes 10 seconds for n=1e9. It is the best result I know among solutions in Python. http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
J.F. Sebastian
It's nice to see @Robert still going. I admit freely I've been busy with semester exams, and hence the downtime.
Xavier Ho
@Xavier Ho: i added my last attempt of a numpy generator,if you have time, you may want to timeit that. (I know it's a wiki but some benchmark in other machine would be great).
Robert William Hanks
+3  A: 

Related question.
Fastest way to list all primes below N in python

Hi, i am too looking for a code in python to generate primes up to 10*9 as fast as i can, which is difficult because of the memory problem. up to now i came up with this to generate primes up to 10*6 & 10*7 (0,171s & 1,764s respectively in my old machine), which seems to be slightly faster(at least in my computer) than "My improved version with slicing" from previous post,probably because i use n//i-i +1 (see below)instead of the analogous len(flags[i2::i<<1]) in the other code. please correct me if i am wrong. Any suggestion for improvement are very welcome.

def primes(n):
    """ Returns  a list of primes < n """
    sieve = [True] * n
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i]:
            sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
    return [2] + [i for i in xrange(3,n,2) if sieve[i]]

In one of his codes Xavier uses flags[i2::i<<1] and len(flags[i2::i<<1]), i computed the size explicit using the fact that between i*i..n we have (n-i*i)//2*i multiples of 2*i because we want to count i*i also we sum 1 so len(sieve[i*i::2*i]) equal (n-i*i)//(2*i) +1 This make the code faster. A basic generator based on the code above would be:

def primesgen(n):
    """ Generates all primes <= n """
    sieve = [True] * n
    yield 2
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i]:
            yield i
            sieve[i*i::2*i] = [False]*((n-i*i-1)/(2*i)+1)
    for i in xrange(i+2,n,2):
        if sieve[i]: yield i

with a bit of modification one can write a slightly slower version of the code above that start with a sieve half of the size sieve = [True] * (n//2) and works for the same n. not sure how that you reflect in the memory issue. As an example of implementation here is the modified version of the numpy rosetta code (maybe faster) starting with a sieve half of the size.

import numpy
def primesfrom3to(n):
    """ Returns a array of primes, 3 <= p < n """
    sieve = numpy.ones(n/2, dtype=numpy.bool)
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i/2]: sieve[i*i/2::i] = False
    return 2*numpy.nonzero(sieve)[0][1::]+1

A Faster & more memory-wise generator would be:

import numpy
def primesgen1(n):
""" Input n>=6, Generates all primes < n """
sieve1 = numpy.ones(n/6+1, dtype=numpy.bool)
sieve5 = numpy.ones(n/6  , dtype=numpy.bool)
sieve1[0] = False
yield 2; yield 3;
for i in xrange(int(n**0.5)/6+1):
    if sieve1[i]:
        k=6*i+1; yield k;
        sieve1[ ((k*k)/6) ::k] = False
        sieve5[(k*k+4*k)/6::k] = False
    if sieve5[i]:
        k=6*i+5; yield k;
        sieve1[ ((k*k)/6) ::k] = False
        sieve5[(k*k+2*k)/6::k] = False
for i in xrange(i+1,n/6):
        if sieve1[i]: yield 6*i+1
        if sieve5[i]: yield 6*i+5
if n%6 > 1:
    if sieve1[i+1]: yield 6*i+1

or with a bit more code:

import numpy
def primesgen2(n):
    """ Input n>=30, Generates all primes < n """
    sieve01 = numpy.ones(n/30+1, dtype=numpy.bool)
    sieve07 = numpy.ones(n/30+1, dtype=numpy.bool)
    sieve11 = numpy.ones(n/30+1, dtype=numpy.bool)
    sieve13 = numpy.ones(n/30+1, dtype=numpy.bool)
    sieve17 = numpy.ones(n/30+1, dtype=numpy.bool)
    sieve19 = numpy.ones(n/30+1, dtype=numpy.bool)
    sieve23 = numpy.ones(n/30+1, dtype=numpy.bool)
    sieve29 = numpy.ones(n/30  , dtype=numpy.bool)
    sieve01[0] = False
    yield 2; yield 3; yield 5;
    for i in xrange(int(n**0.5)/30+1):
        if sieve01[i]:
            k=30*i+1; yield k;
            sieve01[     (k*k)/30::k] = False
            sieve07[(k*k+ 6*k)/30::k] = False
            sieve11[(k*k+10*k)/30::k] = False
            sieve13[(k*k+12*k)/30::k] = False
            sieve17[(k*k+16*k)/30::k] = False
            sieve19[(k*k+18*k)/30::k] = False
            sieve23[(k*k+22*k)/30::k] = False
            sieve29[(k*k+28*k)/30::k] = False
        if sieve07[i]:
            k=30*i+7; yield k;
            sieve01[(k*k+ 6*k)/30::k] = False
            sieve07[(k*k+24*k)/30::k] = False
            sieve11[(k*k+16*k)/30::k] = False
            sieve13[(k*k+12*k)/30::k] = False
            sieve17[(k*k+ 4*k)/30::k] = False
            sieve19[     (k*k)/30::k] = False
            sieve23[(k*k+22*k)/30::k] = False
            sieve29[(k*k+10*k)/30::k] = False
        if sieve11[i]:
            k=30*i+11; yield k;
            sieve01[     (k*k)/30::k] = False
            sieve07[(k*k+ 6*k)/30::k] = False
            sieve11[(k*k+20*k)/30::k] = False
            sieve13[(k*k+12*k)/30::k] = False
            sieve17[(k*k+26*k)/30::k] = False
            sieve19[(k*k+18*k)/30::k] = False
            sieve23[(k*k+ 2*k)/30::k] = False
            sieve29[(k*k+ 8*k)/30::k] = False
        if sieve13[i]:
            k=30*i+13; yield k;
            sieve01[(k*k+24*k)/30::k] = False
            sieve07[(k*k+ 6*k)/30::k] = False
            sieve11[(k*k+ 4*k)/30::k] = False
            sieve13[(k*k+18*k)/30::k] = False
            sieve17[(k*k+16*k)/30::k] = False
            sieve19[     (k*k)/30::k] = False
            sieve23[(k*k+28*k)/30::k] = False
            sieve29[(k*k+10*k)/30::k] = False
        if sieve17[i]:
            k=30*i+17; yield k;
            sieve01[(k*k+ 6*k)/30::k] = False
            sieve07[(k*k+24*k)/30::k] = False
            sieve11[(k*k+26*k)/30::k] = False
            sieve13[(k*k+12*k)/30::k] = False
            sieve17[(k*k+14*k)/30::k] = False
            sieve19[     (k*k)/30::k] = False
            sieve23[(k*k+ 2*k)/30::k] = False
            sieve29[(k*k+20*k)/30::k] = False
        if sieve19[i]:
            k=30*i+19; yield k;
            sieve01[     (k*k)/30::k] = False
            sieve07[(k*k+24*k)/30::k] = False
            sieve11[(k*k+10*k)/30::k] = False
            sieve13[(k*k+18*k)/30::k] = False
            sieve17[(k*k+ 4*k)/30::k] = False
            sieve19[(k*k+12*k)/30::k] = False
            sieve23[(k*k+28*k)/30::k] = False
            sieve29[(k*k+22*k)/30::k] = False
        if sieve23[i]:
            k=30*i+23; yield k;
            sieve01[(k*k+24*k)/30::k] = False
            sieve07[(k*k+ 6*k)/30::k] = False
            sieve11[(k*k+14*k)/30::k] = False
            sieve13[(k*k+18*k)/30::k] = False
            sieve17[(k*k+26*k)/30::k] = False
            sieve19[     (k*k)/30::k] = False
            sieve23[(k*k+ 8*k)/30::k] = False
            sieve29[(k*k+20*k)/30::k] = False
        if sieve29[i]:
            k=30*i+29; yield k;
            sieve01[     (k*k)/30::k] = False
            sieve07[(k*k+24*k)/30::k] = False
            sieve11[(k*k+20*k)/30::k] = False
            sieve13[(k*k+18*k)/30::k] = False
            sieve17[(k*k+14*k)/30::k] = False
            sieve19[(k*k+12*k)/30::k] = False
            sieve23[(k*k+ 8*k)/30::k] = False
            sieve29[(k*k+ 2*k)/30::k] = False
    for i in xrange(i+1,n/30):
            if sieve01[i]: yield 30*i+1
            if sieve07[i]: yield 30*i+7
            if sieve11[i]: yield 30*i+11
            if sieve13[i]: yield 30*i+13
            if sieve17[i]: yield 30*i+17
            if sieve19[i]: yield 30*i+19
            if sieve23[i]: yield 30*i+23
            if sieve29[i]: yield 30*i+29
    if n%30 > 1:
        if sieve01[i+1]: yield 30*i+1
    if n%30 > 7:
        if sieve07[i+1]: yield 30*i+7
    if n%30 > 11:
        if sieve11[i+1]: yield 30*i+11
    if n%30 > 13:
        if sieve13[i+1]: yield 30*i+13
    if n%30 > 17:
        if sieve17[i+1]: yield 30*i+17
    if n%30 > 19:
        if sieve19[i+1]: yield 30*i+19
    if n%30 > 23:
        if sieve23[i+1]: yield 30*i+23

Ps: If you have any suggestions about how to speed up this code, anything from change order of operations to pre-computing anything, please comment.

Robert William Hanks
Well, it would be faster because you're using list comprehension and not generator. Nice one, I'll add the benchmark when I get the time.
Xavier Ho
Just a thought, can you explain how your `m = n // i-i` is analogous to my `flags[i2::i<<1]`? Since I ignored iterating through all the multiples of two, I also avoided it in the 'step' in the slicing syntax. The way you're doing it is still iterating over every multiple of itself.
Xavier Ho
sorry man i am new to programing i don't even know what << means at this point. And i am not sure if my code is faster than yours or exactly why, my guess was you calling len(). Maybe this can help, sorry if off the point.well math tells us the the numbers the the numbers of multiples o i in range(1,n) is equal to n//i (integer division), the number of multiples o i in range (1,i*i) is i (1i,2i,3i,...i*i) so in [i*i::i] we have multiples of i in range(1,n) - multiples of i in range(1,i*i) +1 (+one because we want to count i*i too) thus the formula len(sieve[i**i::i]) equal n//i-i+1.
Robert William Hanks
In my code i ignore multiples of two completely and i don't flag than out of my sieve since i rely in the step of the range function being 2 for sieving and forming the primes list.(i only add [2] to the list at the end)
Robert William Hanks
On a side note one can skip the prime 3 completely too if the initialization of the sieve is made something like this [False,True,True] * ((n+1)//3) see my second example primes2(), its bit faster. Please ensure the the input is one less than a multiple of 3.in my machine the difference is so small that i did not care for a better code.
Robert William Hanks
@Robert: Great idea!
Xavier Ho
@Robert: Have you tried your code? I suspect it generates `8` as a prime number...
Xavier Ho
Not possible, i generate the primes by this list comprehension [2,3] + [i for i in range(5,n+1,2) if sieve[i]] as the index of the range is equal to the value (sieve[i] equal i) i only check for odd integers. the even integers in my sieve are all flagged True, i simply ignore than. by the way using this fact and the trick you showed me [i*i::2*i] i think it's possible to make a version of the code that start with half of the values in the sieve since we only need to care about odd numbers. that should use a bit less memory and maybe it will be faster.
Robert William Hanks
I updated the code responding to a comment of Xavier with respect to "how your m = n // i-i is analogous to my flags[i2::i<<1]", hope it helps.
Robert William Hanks
@Robert: Yeah, changing it to `2*i` should eliminate the incorrect numbers generated.
Xavier Ho
@Robert: And great job! 159 ms on my machine for your generator (2nd code). Your effort will not go unappreciated.
Xavier Ho
@Xavier Ho: is my new generator any good?
Robert William Hanks
@Robert: I'll give that a run tonight.
Xavier Ho
`primesgen2()` shows a tiny improvement compared to `primesgen1()`: 33 ms vs 37 ms for n=1e6 http://ideone.com/xcoii.
J.F. Sebastian
@J.F. Sebastian: thanks for the feedback man, primesgen1() uses a combined sieve of size (1/3)*n, primesgen2() one of sieze (8/30)*n. 33% vs 26% (can we say that is the memory usage of each version?) that is why i tried.
Robert William Hanks
Results for n=1e9 confirm "memory hypothesis": `primesgen1()` vs. `primesgen2()` is 35 vs. 27 seconds (the time ratio is similar to the memory usage ratio). `primesfrom2to3()` is 9.9 seconds (for comparison).
J.F. Sebastian
A: 

Another really stupid option, but that can be of help if you really need a large list of primes number available very fast. Say, if you need them as a tool to answer project Euler's problems (if the problem is just optimizing your code as a mind game it's irrelevant).

Use any slow solution to generate list and save it to a python source file, says primes.py that would just contain:

primes = [ a list of a million primes numbers here ]

Now to use them you just have to do import primes and you get them with minimal memory footprint at the speed of disk IO. Complexity is number of primes :-)

Even if you used a poorly optimized solution to generate this list, it will only be done once and it does not matter much.

You could probably make it even faster using pickle/unpickle, but you get the idea...

kriss