views:

2488

answers:

4

Hi all,

I'm trying to create a byte array whose size is of type long. For example, think of it as:

long x = _________;
byte[] b = new byte[x];

Apparently you can only specify an int for the size of a byte array.

Before anyone asks why I would need a byte array so large, I'll say I need to encapsulate data of message formats that I am not writing, and one of these message types has a length of an unsigned int (long in java).

Is there a way to create this byte array?

I am thinking if there's no way around it, I can create a byte array output stream and keep feeding it bytes, but I don't know if there's any restriction on a size of a byte array.

Please let me know.

Thanks, jbu

+5  A: 

A byte[] with size of the maximum 32-bit signed integer would require 2GB of contiguous address space. You shouldn't try to create such an array. Otherwise, if the size is not really that large (and it's just a larger type), you could safely cast it to an int and use it to create the array.

Mehrdad Afshari
The original questioner presumably is not using a 32-bit JVM. An array of int[] with 2^32 bytes is constructible...
Tom Hawtin - tackline
actually the max is 31-bit integer since java's types are signed. So 2 gigs roughly.
jbu
jbu: Oops. You're right. Obviously, it's also available in a 64 bit process but I meant to say it's too large and if you are really creating such a large array, you're most probably going the wrong way.
Mehrdad Afshari
mehrdad: I don't know if *I'm* going the wrong way...again it's a message type that I'm handling that can be that big (theoretically). It seems that the one going the wrong way is the guy who create this message type. I do not know whether or not he uses the full size of his message, but I'd feel like I'd like to support his message and not throw away bytes (even if he using them).
jbu
If you really expect the message to be that large, you should use some kind of buffering mechanism so that you don't load the whole thing at once into memory. I just tried creating an array of 2^30 bytes (Integer.MAX_VALUE/2) in a 64 bit JVM and it throws OutOfMemoryError.
Mehrdad Afshari
yes i believe it's a ... stack overflow :)I guess I need to throw away bytes then or ask this guy if he intends to use all those bytes.
jbu
jbu: Actually, it's not created on stack. It's lack of enough Java heap space. I could create Integer.MAX_VALUE/4 bytes on 64 bit and much less (nowhere near that) in 32 bit. You should really think about buffering if you expect the message to be larger than a couple hundred megabytes.
Mehrdad Afshari
you're right, heap
jbu
+1  A: 

You should probably be using a stream to read your data in and another to write it out. If you are gong to need access to data later on in the file, save it. If you need access to something you haven't ran into yet, you need a two-pass system where you run through once and store the "stuff you'll need for the second pass, then run through again".

Compilers work this way.

The only case for loading in the entire array at once is if you have to repeatedly randomly access many locations throughout the array. If this is the case, I suggest you load it into multiple byte arrays all stored in a single container class.

The container class would have an array of byte arrays, but from outside all the accesses would seem contiguous. You would just ask for byte 49874329128714391837 and your class would divide your Long by the size of each byte array to calculate which array to access, then use the remainder to determine the byte.

It could also have methods to store and retrieve "Chunks" that could span byte-array boundaries that would require creating a temporary copy--but the cost of creating a few temporary arrays would be more than made up for by the fact that you don't have a locked 2gb space allocated which I think could just destroy your performance.

Edit: ps. If you really need the random access and can't use streams then implementing a containing class is a Very Good Idea. It will let you change the implementation on the fly from a single byte array to a group of byte arrays to a file-based system without any change to the rest of your code.

Bill K
I doubt even in that case you could allocate such amount of memory. Let `long` alone, the second line throws exception on a 64 bit JRE on my machine: "byte[] a1 = new byte[Integer.MAX_VALUE/4]; byte[] a2 = new byte[Integer.MAX_VALUE/4];" He would have to use some kind of in memory buffer if he's dealing with such a large amount of data.
Mehrdad Afshari
That's why I suggested a small class that could be used to change the implementation on the fly. Of course, streaming should be used if at all possible (and it absolutely should be possible!) but if not, it might be possible to use some kind of caching algorithm with smaller blocks held by soft references.
Bill K
A: 

One way to "store" the array is to write it to a file and then access it (if you need to access it like an array) using a RandomAccessFile. The api for that file uses long as an index into file instead of int. It will be slower, but much less hard on the memory.

This is when you can't extract what you need during the initial input scan.

Kathy Van Stone
A: 

It's not of immediate help but creating arrays with larger sizes (via longs) is a proposed language change for Java 7. Check out the Project Coin proposals for more info

Brian Agnew