tags:

views:

601

answers:

7

I have a big file, it's expected to be around 12gb. I want to load it all into memory on a beefy 64bit machine with 16gb ram, but I think Java does not support byte arrays that big:

File f = new File(file);
long size = f.length();
byte data[] = new byte[size]; // <- does not compile, not even on 64bit JVM

is it possible with Java?

The compiler error from the Eclipse compiler is :

Type mismatch: cannot convert from long to int

javac gives:

possible loss of precision
found   : long
required: int
         byte data[] = new byte[size];
+7  A: 

Java array indices are of type int, so I'm afraid you're limited to 2^31 - 1, or 2147483647 bytes (2GB). I'd read the data into another data structure, like a 2D array.

Bill the Lizard
thanks everyone. Bill got the answer first.
Omry
+5  A: 

If necessary, you can load the data into an array of arrays, which will give you a maximum of int.maxValue squared bytes, more than even the beefiest machine would hold well in memory.

Jekke
that would be my next step.since I intend to do a binary search on the data, it will uglify the code, but I`m afraid there is no choice.
Omry
You could make a class that manages an array of arrays but provides an abstraction similar to a regular array, e.g, with get and set that take a long index.
Jay Conrod
+2  A: 

I suggest you define some "block" objects, each of which holds (say) 1Gb in an array, then make an array of those.

pjc50
+1  A: 

Java arrays use integers for their indices. As a result, the maximum array size is Integer.MAX_VALUE.

(Unfortunately, I can't find any proof from Sun themselves about this, but there are plenty of discussions on their forums about it already.)

I think the best solution you could do in the meantime would be to make a 2D array, i.e.:

byte[][] data;
Daniel Lew
+2  A: 

No, arrays are indexed by ints (except some versions of JavaCard that use shorts). You will need to slice it up into smaller arrays, probably wrapping in a type that gives you get(long), set(long,byte), etc. With sections of data that large, you might want to map the file use java.nio.

Tom Hawtin - tackline
A: 

As others have said, all Java arrays, of all types are indexed by ints, and so can be of max size 2^31 - 1, or 2147483647 bytes (2GB). This is specified by the Java Language Specification so switching to another operating system or Java Virtual Machine won't help.

If you wanted to write a class to overcome this as suggested above you could, which could use an array of arrays (for a lot of flexibility) or change types (a long it 8 bytes so a long[] can be 8 times bigger than a byte[].

Nick Fortescue
+1  A: 

You might consider using FileChannel and MappedByteBuffer to memory map the file,

FileChannel fCh = new RandomAccessFile(file,"rw").getChannel();
long size = fCh.size();
ByteBuffer map = fCh.map(FileChannel.MapMode.READ_WRITE, 0, fileSize);

Edit:

Ok, I'm an idiot it looks like ByteBuffer only takes a 32-bit index as well which is odd since the size parameter to FileChannel.map is a long... But if you decide to break up the file into multiple 2Gb chunks for loading I'd still recommend memory mapped IO as there can be pretty large performance benefits. You're basically moving all IO responsibility to the OS kernel.

Jeff Mc