views:

160

answers:

7

I am trying to create an array or list that could handle in theory, given adequate hardware and such, as many as 100^100 BigInteger entries. The problem with using an array or standard list is that they can only hold Integer.MAX_VALUE number of entries. How would you work around this limitations? A whole new class/interface? A wrapper for list? another data type entirely?

+4  A: 

100^100 = 10^200. Assuming BigInteger's memory footprint being 28 bytes (it has 7 int fields) that's 2.8 * 10^201 bytes or 2.8 * 10^192 gigabytes. There's no adequate hardware and there never will be :-)

ChssPly76
I think the issue is related to long for index and size, not physical memory limitations.
Robin
I understand that, but I think that even "spherical horse in vacuum" type problems should be plausible, e.g. said horse should not be required to move faster than speed of light :-)
ChssPly76
A: 

The first thing that comes to my mind is to create a new ArrayList type that that supports long indexes and has an array of multiple ArrayList's. Then you can implement get/set/etc methods so that when you access an index greater than 2^32, the next ArrayList in the array is accessed. To determine which array to use, hash the index by (2^32 - 1) mod index.

To deal with the size constraint problem, you'll have to serialize some of the arrays to disk. If you're in HPC though, this isn't as much of a problem. A shared memory system I have access to has 256GB memory available, per node. How long it will take you to iterate through that list is another problem, but I think the astrophysicists do things close to that scale.

The size of the list looks too large to work with (as other posters have said), so you will have to scale things down to a workable size.

Dana the Sane
Technically, you cannot override get/set methods with long parameters... these will be new methods, and not part of implementation of List interface. (Well, I guess you know that, but still, it's worth to point out, because this is just one of technical problem you'll run into when dealing with large datasets)
Peter Štibraný
I reworded that sentence to indicate that you're not overriding.
Dana the Sane
A: 

You can use linked list ... but you'll die before list.get(list.size()-1) returns :-)

Btw, take a look at Fastutil collections library which can work with large amount of data.

Peter Štibraný
oh, list.size() can't return big-enough value, so my joke is invalid :-)
Peter Štibraný
A: 

You can effectively get a 64-bit int from 2 32-bit ints, or a 128-bit int from 4 32-bit ints, or indeed any size you require from enough ints.

To demonstrate, consider the simplest case of representing a 64-bit int with 2 32bit ints.

Some pseudocode:

int64 c = Get64BitInt(int32 a, int32 b)
{
   c = 2^32*a + b
}

You could define a new class to hold big integers using integer arrays to store the number. You'll have to write your own arithmetic methods, but this shouldn't be too taxing.

Richie Cotton
It's called BigInteger and is in JDK since 1.1.
Peter Štibraný
Oops. Misread the question. Sorry.
Richie Cotton
+4  A: 

A 22-dimensional java array would have enough space to hold the data - in theory.

But we should keep in mind, that the number of atoms in the whole universe is estimated to 10^78 (ref in German).

So, before starting to implement, you'd have to think how to store 10^23 bytes on every atom in the universe...

Edit

In general, if you need large datastructures that support access in O(1), you can create multidimensional arrays.

A 2-dimensional array *array[Integer.MAX_VALUE][Integer.MAX_VALUE]* can hold about 4.6x10^18 values. You can address each value ai by *array[ai mod Integer.MAX_VALUE][ai div Integer.MAX_VALUE]*. And of course this works for higher-dimensional arrays aswell.

Andreas_D
I like this idea of multi dimensional arrays. I could probably use that as the back end for a new list interface like others have suggested. As for the whole more data than atoms in the universe thing, that was a problem I was just going to ignore for now. ;-)
faceless1_14
+1  A: 

I would create a new interface type that would allow for larger values. Probably using long for max size and index parameters.

Robin
A: 

I think for several reasons you are looking at making your own collection. First, the list interface assumes int length. Although you could force a list implementation to not be 0 based in theory, thus doubling your potential capacity, it would still be dicey.

The other reason is you are probably looking at something not stored in memory fully, so caching, indexing, iterating, etc., will have an external resource dependency and you may only want a get by index or an iterator, not both.

This sounds like a massive distributed computing problem, and that is not something Java collections were designed to address.

If however you just need such a large index (because you are plotting a small number of points on a very long line) then a custom interface backed by a Map (containing a BigInteger key and the value representing the list content) might get you what you want. The Map implementation might have to separately track insertion order if you really want List-like behavior.

Yishai