I am trying to create an array or list that could handle in theory, given adequate hardware and such, as many as 100^100 BigInteger entries. The problem with using an array or standard list is that they can only hold Integer.MAX_VALUE number of entries. How would you work around this limitations? A whole new class/interface? A wrapper for list? another data type entirely?
views:
160answers:
7100^100 = 10^200. Assuming BigInteger's memory footprint being 28 bytes (it has 7 int
fields) that's 2.8 * 10^201 bytes or 2.8 * 10^192 gigabytes. There's no adequate hardware and there never will be :-)
The first thing that comes to my mind is to create a new ArrayList
type that that supports long
indexes and has an array of multiple ArrayList
's. Then you can implement get/set/etc methods so that when you access an index greater than 2^32, the next ArrayList
in the array is accessed. To determine which array to use, hash the index by (2^32 - 1) mod index
.
To deal with the size constraint problem, you'll have to serialize some of the arrays to disk. If you're in HPC though, this isn't as much of a problem. A shared memory system I have access to has 256GB memory available, per node. How long it will take you to iterate through that list is another problem, but I think the astrophysicists do things close to that scale.
The size of the list looks too large to work with (as other posters have said), so you will have to scale things down to a workable size.
You can use linked list ... but you'll die before list.get(list.size()-1)
returns :-)
Btw, take a look at Fastutil collections library which can work with large amount of data.
You can effectively get a 64-bit int from 2 32-bit ints, or a 128-bit int from 4 32-bit ints, or indeed any size you require from enough ints.
To demonstrate, consider the simplest case of representing a 64-bit int with 2 32bit ints.
Some pseudocode:
int64 c = Get64BitInt(int32 a, int32 b)
{
c = 2^32*a + b
}
You could define a new class to hold big integers using integer arrays to store the number. You'll have to write your own arithmetic methods, but this shouldn't be too taxing.
A 22-dimensional java array would have enough space to hold the data - in theory.
But we should keep in mind, that the number of atoms in the whole universe is estimated to 10^78 (ref in German).
So, before starting to implement, you'd have to think how to store 10^23 bytes on every atom in the universe...
Edit
In general, if you need large datastructures that support access in O(1), you can create multidimensional arrays.
A 2-dimensional array *array[Integer.MAX_VALUE][Integer.MAX_VALUE]* can hold about 4.6x10^18 values. You can address each value ai by *array[ai mod Integer.MAX_VALUE][ai div Integer.MAX_VALUE]*. And of course this works for higher-dimensional arrays aswell.
I would create a new interface type that would allow for larger values. Probably using long for max size and index parameters.
I think for several reasons you are looking at making your own collection. First, the list interface assumes int length. Although you could force a list implementation to not be 0 based in theory, thus doubling your potential capacity, it would still be dicey.
The other reason is you are probably looking at something not stored in memory fully, so caching, indexing, iterating, etc., will have an external resource dependency and you may only want a get by index or an iterator, not both.
This sounds like a massive distributed computing problem, and that is not something Java collections were designed to address.
If however you just need such a large index (because you are plotting a small number of points on a very long line) then a custom interface backed by a Map (containing a BigInteger key and the value representing the list content) might get you what you want. The Map implementation might have to separately track insertion order if you really want List-like behavior.