views:

386

answers:

11

What's the best data structure (in java) for the task of loading 51 million primes and then iterating over them?

I need to know, for example, the primes that are between 1000000000 and that same number minus 100000.

+3  A: 

Why store them in a map at all? Is this so you have fast lookup to see if any given number is a prime? That would make sense and give you fast access. The cost of adding them can be mitigated (but not eliminated) by setting the initial capacity of the TreeMap. This will still incur tree rebalancing costs however.

An alternative storage might be simply to sort them and put them in an array. This will give you O(log n) lookup with a bisection search but will make getting ranges trivial. You can use Arrays.binarySearch().

cletus
when using a TreeMap, who would be the key and who would be the value?
omgzor
+1  A: 

Seems to me that a simple array (or ArrayList since it's easier to work with) would be fine. Adding elements is O(1) and you can get all the primes between x and y by doing a binary search for the first prime >= x (see http://java.sun.com/j2se/1.5.0/docs/api/java/util/Collections.html#binarySearch%28java.util.List,%20T%29 ), and then just go through the list until you get to a prime > y.

(I realize cletus beat me to it, but hopefully the extra detail is of some use.)

MatrixFrog
+4  A: 

A binary search isn't going to be wonderful for this data, since the first half of the primes are going to be closer to each other than the last half of them.

You might be able to improve on your search by knowing how many primes there are under x. Maybe skew the cut by using the approximation mentioned in the link.


My first try would be this. I'd have two arrays.

  1. An array of all the primes.
  2. An array that tells me where in the first array the first prime above 1000*n was. So if I wanted to find the first prime with a value of 5000 or more, I'd look at secondArray[5000/1000-1].

I'd get a rough position with array 2 before doing anything with array 1.

Nosredna
thanks, I didn't know I was actually working with 51 million primes =o!
omgzor
Dang it, you changed the question on me! :-)
Nosredna
+1  A: 

The n'th prime is about p(n) ~ n ln(n), ie

p(51E6) ~ 905114146 < 2147483647 = Integer.MAX_VALUE

This means the most efficient way to store the first 51 million primes is an int[].

Christoph
+1  A: 

It depends on exactly the balance of operations, and usage. A simple sorted array will be best for storing the primes.

Now, if performance is really at a premium and memory cost is insignificant then you could augment this with an index of indices. e.g.

int MAX_NUM_PRIMES =    ...   // the maximum number of primes to be stored
int MAX_PRIME = ....          // the largest prime to be stored
int primes[MAX_NUM_PRIMES]    // array of prime numbers, sorted
int nextPrime[MAX_PRIME]      // nextPrime[i] is the index of the next prime >= i

where nextPrime[i] is the starting point in the array primes for the first prime > i.

then, to iterate over e.g.   2000 primes from   3456, you would do

int j = nextPrime[3456]
for (i = j; i < j + 2000; i++) {
    int x = prime[i];
    ... do whatever with x ...
}
Larry Watanabe
+3  A: 

Since you can precalculate all the primes, and (by the Prime Number Theorem that Nosredna and others have mentioned) you know about how many there will be, you can use a fixed structure (int[]) and one-time in-order insertion cost shouldn't be a concern.

Binary search (As Arrays.binarySearch()) will be so fast you probably don't need to consider optimizations. But, you could also use the Prime Number Theorem's predictions of roughly where the Nth prime is to find the endpoints of ranges even more quickly.

Just to be different, I'll point out that at this scale you could also store the primes as set bits in a large bit field, where if N is prime, bit #N is set to 1. The structure would actually be smaller than the int[] -- 1 billion bits is ~110MiB, while 51 million ints is ~200MiB. See the class BitSet. Since no even indexes are prime, you could subclass or wrap BitSet to give the trivial answer for all even indexes and half/double values as appropriate before passing to/from BitSet, and thus store the whole field in ~55MiB.

Testing a prime with such a structure is O(1), but iterating over all the set bits (primes) depends on the density of primes in the range you've targeted. It still should be pretty fast though.

gojomo
A: 

If you want the best data structure for quickly finding the number of primes between x and y (as in your example) you want a Binary Indexed Tree.

There is a good description here.

Rasmus Faber
+1  A: 

I need to know, for example, the primes that are between 1000000000 and that same number minus 100000.

Then build a sieve for exactly those numbers you are interested in. Computing all primes below is a waste, unless you want to know exactly how many primes there are below 999900000.

A good data structure for this size of numbers is a bit set. Because about one in 21 numbers is a prime, it takes less memory than storing the numbers explicitly, and it is fast enough for iterating through ranges.

Edit: To be concrete, on my laptop in Java sieving the whole range takes a little over a minute, sieving the last 100000 about 30 milliseconds.

starblue
what sieve algorithm are you using? Atkin or Eratosthenes?
omgzor
Eratosthenes. See http://stackoverflow.com/questions/1042902/most-elegant-way-to-generate-prime-numbers/1043247#1043247
starblue
A: 

this java applet seems fairly fast: Table of Primes from 1 to 1 000 000 000 000 http://www.walter-fendt.de/m14e/primes.htm (no source though, but you might try the author)

Ray Tayek
A: 

An array of numbers will probably do fine :)

The problem might be generating the array? In that case create an object containing the array and populate it (by generating them or reading from a list of primes). When done, serialize it to disk so the program can read the binary stream fast in the future to load the array.

See this question for variations on how to generate the prime array: http://stackoverflow.com/questions/288200/prime-number-calculation-fun

Thorbjørn Ravn Andersen
A: 

By your requirement, you should use the Segmented Sieve of Eratosthenes. it won't require a large amount of memory..

Find all primes upto the square root of 999900000. (~31,621) which can easily be stored in an array.

Now, Carry out the Sieve-ing process over an 100000-length array. with these prime numbers.

Quite Efficient, for large numbers.

st0le