What's the best data structure (in java) for the task of loading 51 million primes and then iterating over them?
I need to know, for example, the primes that are between 1000000000 and that same number minus 100000.
What's the best data structure (in java) for the task of loading 51 million primes and then iterating over them?
I need to know, for example, the primes that are between 1000000000 and that same number minus 100000.
Why store them in a map at all? Is this so you have fast lookup to see if any given number is a prime? That would make sense and give you fast access. The cost of adding them can be mitigated (but not eliminated) by setting the initial capacity of the TreeMap. This will still incur tree rebalancing costs however.
An alternative storage might be simply to sort them and put them in an array. This will give you O(log n) lookup with a bisection search but will make getting ranges trivial. You can use Arrays.binarySearch().
Seems to me that a simple array (or ArrayList since it's easier to work with) would be fine. Adding elements is O(1) and you can get all the primes between x and y by doing a binary search for the first prime >= x (see http://java.sun.com/j2se/1.5.0/docs/api/java/util/Collections.html#binarySearch%28java.util.List,%20T%29 ), and then just go through the list until you get to a prime > y.
(I realize cletus beat me to it, but hopefully the extra detail is of some use.)
A binary search isn't going to be wonderful for this data, since the first half of the primes are going to be closer to each other than the last half of them.
You might be able to improve on your search by knowing how many primes there are under x. Maybe skew the cut by using the approximation mentioned in the link.
My first try would be this. I'd have two arrays.
I'd get a rough position with array 2 before doing anything with array 1.
The n'th prime is about p(n) ~ n ln(n)
, ie
p(51E6) ~ 905114146 < 2147483647 = Integer.MAX_VALUE
This means the most efficient way to store the first 51 million primes is an int[]
.
It depends on exactly the balance of operations, and usage. A simple sorted array will be best for storing the primes.
Now, if performance is really at a premium and memory cost is insignificant then you could augment this with an index of indices. e.g.
int MAX_NUM_PRIMES = ... // the maximum number of primes to be stored
int MAX_PRIME = .... // the largest prime to be stored
int primes[MAX_NUM_PRIMES] // array of prime numbers, sorted
int nextPrime[MAX_PRIME] // nextPrime[i] is the index of the next prime >= i
where nextPrime[i] is the starting point in the array primes for the first prime > i.
then, to iterate over e.g. 2000 primes from 3456, you would do
int j = nextPrime[3456]
for (i = j; i < j + 2000; i++) {
int x = prime[i];
... do whatever with x ...
}
Since you can precalculate all the primes, and (by the Prime Number Theorem that Nosredna and others have mentioned) you know about how many there will be, you can use a fixed structure (int[]) and one-time in-order insertion cost shouldn't be a concern.
Binary search (As Arrays.binarySearch()) will be so fast you probably don't need to consider optimizations. But, you could also use the Prime Number Theorem's predictions of roughly where the Nth prime is to find the endpoints of ranges even more quickly.
Just to be different, I'll point out that at this scale you could also store the primes as set bits in a large bit field, where if N is prime, bit #N is set to 1. The structure would actually be smaller than the int[] -- 1 billion bits is ~110MiB, while 51 million ints is ~200MiB. See the class BitSet. Since no even indexes are prime, you could subclass or wrap BitSet to give the trivial answer for all even indexes and half/double values as appropriate before passing to/from BitSet, and thus store the whole field in ~55MiB.
Testing a prime with such a structure is O(1), but iterating over all the set bits (primes) depends on the density of primes in the range you've targeted. It still should be pretty fast though.
If you want the best data structure for quickly finding the number of primes between x and y (as in your example) you want a Binary Indexed Tree.
There is a good description here.
I need to know, for example, the primes that are between 1000000000 and that same number minus 100000.
Then build a sieve for exactly those numbers you are interested in. Computing all primes below is a waste, unless you want to know exactly how many primes there are below 999900000.
A good data structure for this size of numbers is a bit set. Because about one in 21 numbers is a prime, it takes less memory than storing the numbers explicitly, and it is fast enough for iterating through ranges.
Edit: To be concrete, on my laptop in Java sieving the whole range takes a little over a minute, sieving the last 100000 about 30 milliseconds.
this java applet seems fairly fast: Table of Primes from 1 to 1 000 000 000 000 http://www.walter-fendt.de/m14e/primes.htm (no source though, but you might try the author)
An array of numbers will probably do fine :)
The problem might be generating the array? In that case create an object containing the array and populate it (by generating them or reading from a list of primes). When done, serialize it to disk so the program can read the binary stream fast in the future to load the array.
See this question for variations on how to generate the prime array: http://stackoverflow.com/questions/288200/prime-number-calculation-fun
By your requirement, you should use the Segmented Sieve of Eratosthenes. it won't require a large amount of memory..
Find all primes upto the square root of 999900000. (~31,621) which can easily be stored in an array.
Now, Carry out the Sieve-ing process over an 100000-length array. with these prime numbers.
Quite Efficient, for large numbers.