views:

693

answers:

6

For a library, I need to store the first primes numbers up to a limit L. This collection must have a O(1) lookup time (to check whether a number is prime or not) and it must be easy, given a number, to find the next prime number (assuming it is smaller than L).

Given that L is fixed, an Eratostene sieve to generate the list is fine. Right now, I use a packed boolean array to store the list, which contains only entries for odd numbers between 3 and L (inclusive). This takes (L-2)/2 bits of memory. I would like to be able to statically increase L without using more memory.

Is there a data structure using less memory with similar properties? Or with at least the constant lookup time? (odd numbers can then be enumerated until we get a prime)

(the language I wrote this in is Factor but this question would be the same in any language which has built-in or easily programmable packed bit arrays)

A: 

If you can figure out which ones are Mersenne or other easily represented prime numbers, you might be able to save a few bits by using that representation with a flag for applicable numbers.

Also, how about storing the numbers as the difference from the previous number? Then the size shouldn't rise quite as fast (but lookup would be slow). Combining with the approach above, you could store Mersenne primes and the difference from the last Mersenne prime.

l0b0
+10  A: 

You can explicitly check more prime numbers to remove redundancy.

At the moment you do this only for two, by checking for divisibility by two explicitly and then storing only for odd numbers whether they are prime.

For 2 and 3 you get remainders 0 to 5, of which only 1 and 5 are not divisible by two or three and can lead to a prime number, so you are down to 1/3.

For 2, 3, and 5 you get 8 numbers out of 30, which is nice to store in a byte.

This is explained in more detail here.

starblue
Indeed, filtering some more was one of the ideas I had. But I had not realized that modulo 30 gave a packing that was so efficient. I will give it a try!
Samuel Tardieu
That's a great article!
Andy Mikula
aka wheel factorisation http://primes.utm.edu/glossary/page.php?sort=WheelFactorization if you don't want to read as long and metaphorical description.
Pete Kirkham
A: 

Given that memory is so cheap, I don't think you can do much better from a speed perspective than your existing scheme.

If there's a better solution, then I'd assume it'd take advantage of the Prime Number Theorem that shows that as L gets bigger, the limit of

π(L) / (L / ln(L)) approaches 1.

Perhaps a better solution would have an adaptive packing solution in a data structure sort of like a skip list.

Jeff Moser
+1  A: 

Maybe a trie data structure which contains only the primes is what you're looking for. Instead of using characters as indexes you could use the integer digits. An implementation of this are Judy-Arrays.

Altough, they do not meet your O(1) requirement, they are extremely memory-efficient for similar keys (as most parts of numbers are) and pretty fast to look up with an O(m) (m=key-length) at maximum.

If you look up for a prime in the pre-generated tree, you can walk the tree until you find it or you are already at the node which is next to the preceeding and following prime.

Kosi2801
+2  A: 

At the moment you are treating 2 as special case and then having an array where every odd number is mapped to an element in the array (with some odd numbers being prime). You could improve on this by treating 2 and 3 as special cases recognising that the rest of the prime numbers are in the form 6n+1 or 6n-1 (that is for all primes p where p > 3, p mod 6 = 1 or 5). This can be further generalised - see Wikipedia. For all primes p > 5, p mod 30 = 1, 7, 11, 13, 17, 19, 23 or 29. You could keep going with this and reduce the memory needed at the expense of processing time (although it will still be O(1), just a slower O(1)).

David Johnstone
A: 

How about some kind of hash table?

You would need a very good hash function (something like n mod p, where p is not a multiple of any of the q lowest primes - choose q sufficiently high in order to minimise the number of collisions).

jwoolard