views:

201

answers:

7

I'm trying to implement the sieve of eratosthenes in python, however when trying to find all primes up to the sqare root of for instance 779695003923747564589111193840021 I get an error saying result of range() has too many items. My question is, how do I avoid this problem, if I instantiate the list with a while loop I will get an error saying I'm using too much memory (before it even starts to use the pagefile), the two are listed below:

Using range()

maxnum = 39312312323123123

primes = []
seq = []
i = 0
seq = range(2,maxnum)

for i in seq:
    mul = i * seq
    for j in mul:
        try:
            seq.remove(j)
        except:
            pass
        primes.append(i)

print primes

Using while:

maxnum = 39312312323123123

primes = []
seq = []
i = 0
while i < maxnum:
    seq.append(i)
    i+=1

for i in seq:
    mul = i * seq
    for j in mul:
        try:
            seq.remove(j)
        except:
            pass
        primes.append(i)

print primes
+6  A: 

I would say, "use xrange() instead", but you are actually using the list of ints as the sieve result..... So an integer generator is not a correct solution.

I think it will be difficult to materialize a list with 39312312323123123 elements in it, no matter what function you use to do so.... That is, after all, 279 petabytes of 64-bit integers.

Try this.

class FoundComposite(Exception): pass

primes = [2]

seq = itertools.takewhile(        # Take integers from a list
          lambda x: x<MAXNUM,     #   until we reach MAXNUM
          itertools.count(2)      #   the list of integers starting from 2
          )

#seq = xrange(2, MAXNUM)          # alternatively

for i in seq:
    try:
        for divisor in primes:
            if not (i % divisor):
                # no remainder - thus an even divisor
                # continue to next i in seq
                raise FoundComposite 
        # if this is reached, we have tried all divisors.
        primes.append(i)
    except FoundComposite:
        pass
Joe Koberg
xrange fails with the same MemoryError as range, the point of the sieve is that you need to have all elements, so your other idea wont work either :(
Kristoffer S Hansen
geez i need to read the algorithm a bit more. Sorry. i see you are doing a sieve and need the constructed list of integers.
Joe Koberg
xrange may get around the limitation of instantiating a list with a huge number of elements, but the code as is will fall apart quickly. For example "mul = i * seq" won't work. It would work with a normal list but you are making "i" copies of the list. Not sure what the intention is there. I'd look here for some inspiration: http://code.activestate.com/recipes/117119-sieve-of-eratosthenes/ Those are some pretty smart people.
Brian Luft
You have nested for loops there, I believe the `continue` will only increment the inner for loop. I ended up using a flag and short circuit to do it.
tgray
whoops. corrected with an exception.
Joe Koberg
+1 for pythonicity (python-ness?), much cleaner than a Boolean flag, though it might be a little slower since it gets raised so often.
tgray
A: 

range() returns a list containing all the numbers in the requested range, while xrange is a generator and yields the numbers one after another with a memory consumption close to zero.

Alexander Gessler
I tried xrange but I need to be able to remove from the list which xrange wont let me do.
Kristoffer S Hansen
+2  A: 

Your algorithm is broken. Get it to work for maxnum=100 first.

Once you get it working you will find maxnum=100000000 will takes a long long time to run.

Plot the time it takes to run for maxnum in (10,100,1000,10000,100000,1000000...) you may be able to extrapolate how long 39312312323123123 will take :)

gnibbler
My other posted generator reaches 100000000 in a few minutes, this should be fast, ofcourse I should have checked the algorithm first, but that is the least of my problems at the moment ;) I'll update it till it works
Kristoffer S Hansen
A: 

About the memory limit, how about creating a custom list (class) that internally is a linked list of lists or arrays. Magically traverse from one to the other internally, and add more as needed, as the caller uses your custom list with the external interface you've provided which will be similar to those members needed to facilitate the .append .remove, etc needs of the arrays used in your problem.

Note: I'm not a Python programmer. Not a clue how to implement what I said in Python. Maybe I don't know the context here, so I will understand if I'm voted down.

Maybe use "generators" as they're called in python to yield results of your internal lists as if it were one huge single list. Possibly with linked list.

John K
+1  A: 

There is a third party module for python called gmpy

It has a couple of functions that may be useful to you as they are very fast. The probabilistic stuff kicks in around the 4 billion mark.

next_prime(...)
    next_prime(x): returns the smallest prime number > x.  Note that
    GMP may use a probabilistic definition of 'prime', and also that
    if x<0 GMP considers x 'prime' iff -x is prime; gmpy reflects these
    GMP design choices. x must be an mpz, or else gets coerced to one.

is_prime(...)
    is_prime(x,n=25): returns 2 if x is _certainly_ prime, 1 if x is
    _probably_ prime (probability > 1 - 1/2**n), 0 if x is composite.
    If x<0, GMP considers x 'prime' iff -x is prime; gmpy reflects this
    GMP design choice. x must be an mpz, or else gets coerced to one.
gnibbler
+2  A: 

It's a more complex algorithm, perhaps technically not counting as the sieve, but one approach is to not remove all multiples of a given prime at once, but queue the next multiple (along with the prime). This could be used in a generator implementation. The queue will still end up containing a lot of (multiples of) primes, but not as many as by building then filtering a list.

First few steps done manually, to show the principle...

  • 2 is prime - yield and queue (4, 2)
  • 3 is prime - yield and queue (6, 3)
  • 4 is composite - replace (4, 2) with (6, 2) in the queue
  • 5 is prime - yield and queue (10, 5)
  • 6 is composite - replace (6, 2) with (8, 2) and (6, 3) with (9, 3)

Note - the queue isn't a FIFO. You will always be extracting the tuples with the lowest first item, but new/replacement tuples don't (usually) have the highest first item and (as with 6 above) there will be duplicates.

To handle the queue efficiently in Python, I suggest a dictionary (ie hashtable) keyed by the first item of the tuple. The data is a set of second item values (original primes).

As suggested elsewhere, test with small targets before trying for the big one. And don't be too surprised if you fail. It may still be that you need too many heap-allocated large integers at one time (in the queue) to complete the solution.

Steve314
I did end up using this solution, its incredibly fast and yes, I ended up exceeding the limit again, I guess this just wasn't meant to be in python :(
Kristoffer S Hansen
I wasn't expecting it to be fast, to be honest - just to need a bit less memory.
Steve314
A: 

Try this:

def getPrimes(maxnum):
    primes = []
    for i in xrange(2, maxnum):
        is_mul = False
        for j in primes:         # Try dividing by all previous primes
            if i % j == 0:
                is_mul = True    # Once we find a prime that i is divisible by
                break            # short circuit so we don't have to try all of them
        if not is_mul:           # if we try every prime we've seen so far and `i`
            primes.append(i)     # isn't a multiple, so it must be prime
    return primes

You shouldn't run out of memory until you get to a very large number of primes. This way you don't have to worry about creating a list of multiples. Not sure if this still counts as the sieve though.

Actually, this won't work for maxnum = 39312312323123123. Using the Prime number theorem we can estimate that there will be approximately 1,028,840,332,567,181 prime numbers in that range.

As pointed out in this question the maximum size of a python list on a 32-bit system is 536,870,912. So even if you don't run out of memory, you still won't be able to finish the calculation.

You shouldn't have that problem with a 64-bit system though.

2 ** 64 => 18446744073709551616

Using the math from the aforementioned question (2 ** 64) / 8, the maximum number of elements in a list would be 2,305,843,009,213,693,951 which is greater than the estimated number of primes you will encounter.

Edit:

To avoid memory issues, you could store your list of primes in a file on the hard disk. Store one prime per line and read the file every time you check a new number.

Maybe something like this:

primes_path = r'C:\temp\primes.txt'

def genPrimes():
    for line in open(primes_path, 'r'):
        yield int(line.strip())    

def addPrime(prime):
    primes_file = open(primes_path, 'a')
    primes_file.write('%s\n' % prime)
    primes_file.close()

def findPrimes(maxnum):
    for i in xrange(2, maxnum):
        is_mul = False
        for prime in genPrimes():  # generate the primes from a file on disk
            if i % prime == 0:
                is_mul = True    
                break            
        if not is_mul:           
            addPrime(i)  # append the new prime to the end of your primes file

At the end, you would have a file on your hard drive that contained all your primes.

Ok, so this would be pretty slow, but you wouldn't run out of memory. You could make it faster by increasing the speed at which you can read/write files (like RAID).

tgray
I am actually running on a 64 bit system, I ran into the memory limit using the other algorithm I supplied.
Kristoffer S Hansen