tags:

views:

1769

answers:

14

I'm trying to find a counterexample to the Pólya Conjecture which will be somewhere in the 900 millions. I'm using a very efficient algorithm that doesn't even require any factorization (similar to a Sieve of Eratosthenes, but with even more information. So, a large array of ints is required.

The program is efficient and correct, but requires an array up to the x i want to check for (it checks all numbers from (2, x)). So, if the counterexample is in the 900 millions, I need an array that will be just as large. Java won't allow me anything over about 20 million. Is there anything I can possibly do to get an array that large?

+1  A: 

What do you mean by "won't allow". You probably getting an OutOfMemoryError, so add more memory with the -Xmx command line option.

Tom Hawtin - tackline
yeah, heap space runs out of memory.
+8  A: 

You may want to extend the max size of the JVM Heap. You can do that with a command line option.

I believe it is -Xmx3600m (3600 megabytes)

jjnguy
But note that 2000 MB won't be enough if you want 900 million ints (should need about 4 * 900 = 3600 MB).
schnaader
A: 

Use a memory mapped file (Java 5 NIO package) instead. Or move the sieve into a small C library and use Java JNI.

Aaron Digulla
I am using JVMs now with -mx1024 and they are most certainly using all of it.
Eric Petroelje
A memory mapped file still needs to be small enough to fit into the physical address space.
Mike Houston
@Mike: You mean the mapped view has to be small enough to fit into physical address space (in Win32, this means contiguous non-committed process RAM). The file itself can grow unbounded to the size of the disk's free space.
codekaizen
@codekaizen I see, I didn't realise that. Guess I've never needed just a window onto a file before.
Mike Houston
Perhaps you meant to say 512GB, well Azul support up to 670 GB heap sizes. http://www.azulsystems.com/products/compute_appliance.htm
Peter Lawrey
There is an artificial limit but probably not at 512MB. Sorry for that. Please correct the downvotes because NIO is the way to go.
Aaron Digulla
+4  A: 

If you don't need it all loaded in memory at once, you could segment it into files and store on disk.

sfossen
+5  A: 

Java will allow up to 2 billions array entries. It’s your machine (and your limited memory) that can not handle such a large amount.

Bombe
+1  A: 

You could define your own class which stores the data in a 2d array which would be closer to sqrt(n) by sqrt(n). Then use an index function to determine the two indices of the array. This can be extended to more dimensions, as needed.

The main problem you will run into is running out of RAM. If you approach this limit, you'll need to rethink your algorithm or consider external storage (ie a file or database).

Phil H
+4  A: 

900 million 32 bit ints with no further overhead - and there will always be more overhead - would require a little over 3.35 GiB. The only way to get that much memory is with a 64 bit JVM. Unfortunately they tend to be a bit flaky in my experience but you'll either need to use a 64bit JVM (on a machine with at least 8 GB of RAM) or use some disk backed cache.

Kris
+1  A: 

If your algorithm allows it:

  • Compute it in slices which fit into memory.

    You will have to redo the computation for each slice, but it will often be fast enough.

  • Use an array of a smaller numeric type such as byte.

starblue
A: 

I wrote a version of the Sieve of Eratosthenes for Project Euler which worked on chunks of the search space at a time. It processes the first 1M integers (for example), but keeps each prime number it finds in a table. After you've iterated over all the primes found so far, the array is re-initialised and the primes found already are used to mark the array before looking for the next one.

The table maps a prime to its 'offset' from the start of the array for the next processing iteration.

This is similar in concept (if not in implementation) to the way functional programming languages perform lazy evaluation of lists (although in larger steps). Allocating all the memory up-front isn't necessary, since you're only interested in the parts of the array that pass your test for primeness. Keeping the non-primes hanging around isn't useful to you.

This method also provides memoisation for later iterations over prime numbers. It's faster than scanning your sparse sieve data structure looking for the ones every time.

Mike Houston
in my case, the nonprimes are not just 0's, but contain the largest prime factors. that way, i can find the number of primes without actually factoring the numbers. Composites are almost as important as the primes in this case.
I see, that's a lot of information to store then. The same method might still apply, but you'd need to start spooling each completed chunk out to disk as they are completed. I can see why you'd want to keep it all in memory.
Mike Houston
A: 

I second @sfossen's idea and @Aaron Digulla. I'd go for disk access. If your algorithm can take in a List interface rather than a plain array, you could write an adapter from the List to the memory mapped file.

Jason S
A: 

Use Tokyo Cabinet, Berkeley DB, or any other disk-based key-value store. They're faster than any conventional database but allow you to use the disk instead of memory.

Nikhil Chelliah
A: 

Depending on how you need to access the array, you might find a RandomAccessFile will allow you to use a file which is larger than will fit in memory. However, the performance you get is very dependant on your access behaviour.

Peter Lawrey
A: 

could you get by with 900 million bits? (maybe stored as a byte array).

Ray Tayek
+1  A: 

Java arrays are indexed by int, so an array can't get larger than 2^31 (there are no unsigned ints). So, the maximum size of an array is 2147483648, which consumes (for a plain int[]) 8589934592 bytes (= 8GB).

Thus, the int-index is usually not a limitation, since you would run out of memory anyway.

In your algorithm, you should use a List (or a Map) as your data structure instead, and choose an implementation of List (or Map) that can grow beyond 2^31. This can get tricky, since the "usual" implementation ArrayList (and HashMap) uses arrays internally. You will have to implement a custom data structure; e.g. by using a 2-level array (a list/array). When you are at it, you can also try to pack the bits more tightly.

mfx