views:

90

answers:

3

I use a BinaryReader(MemoryStream(MyByteArray))) to read variable sized records and process them all in memory. This works well as long as my bytestream which is in the array is less than about 1.7G in size. After that (which is the maximum size of an integer in my 64bit system) you cannot create a larger bytearray, although I have enough real memory. So my solution has been to read the bytestream and split it into several bytearrays.

Now however, I cannot "read" across the bytearray boundaries and as my data is in variable format I cannot ensure that bytearrays always finish on a whole record.

This must be a common problem for people processing very large datasets and still have the need for speed.

Any suggestions would be appreciated.

+1  A: 

You should prevent that a byte array of this size is loaded into memory to begin with.

Isn't it possible to implement a streaming solution where you only load parts of the array into memory (a buffer)? Do you need random access to these bytes? Or can you use a forward-only solution where you can read the stream from begin to end while processing it (and without looking back).

Where does this byte array come from? A file, a web service, ...?

Ronald Wildenberg
+3  A: 
Rasmus Faber
Yes. I tried that - unfortunately MemoryMappedFiles are very slow indeed.
ManInMoon
A: 

For excessively large streams, you shouldn't try dumping it in MemoryStream - use things like FileStream instead, and talk directly to disk. The inbuilt buffering is usually sufficient, or you can tweak this with things like BufferedStream (but I have rarely needed to - but then, I tend to include my own data-processing buffer).

You might also consider things like compression or densely packed data, and serializers designed to work by streaming records rather than creating an entire graph at once (although since you mention BinaryReader, you may already be doing this highly manually, so this might not be an issue).

Marc Gravell
Yes. It is compressed - wuite efficiently and I unserialise it with my own logic. But reading from disk is too slow. All - I use parallel processing of this huge datafile and having it on disk would cuase all sorts of contention.
ManInMoon
Using a memoryStrea, all in memory - works perfectly for me - except now my data has outgrown this arbitary maximum size of a bytearray.
ManInMoon
@Marc: I hope you're done reading your 300 emails first, otherwise no dessert (SO) for you!
John K
@ManInMoon - then (and also taking into account your comment on memory-mapped files) you'll have to split the data into multiple byte arrays, and either write your own memory-backed stream implementation, or split it at suitable points that allow multiple independent memory streams.
Marc Gravell