views:

289

answers:

7
+2  Q: 

c# MemoryStream

I have an extreamely large 2D bytearray in memory.

byte MyBA = new byte[int.MaxValue][10];

is there any way (probably unsafe) that I can fool c# into thinking this is one huge continuous bytearray? I want to do this such that I can pass it to a MemoryStream and then a BinaryReader.

MyReader = new BinaryReader(MemoryStream(*MyBA)) //Syntax obviously made-up here

Moon

A: 

You can create a memoryStream and then pass the array in line by line using the method Write

EDIT: The limit of a MemoryStream is certainly the amount of memory present for your application. Maybe there is a limit beneath that but if you need more memory, then you should consider to modify your overall architecture. E.g. you could process your data in chunks, or you could do a swapping mechanism to a file.

schoetbi
YEs - I do that. BUT I believe that the MemoryStream still has a maximum limit to how much you can write to it - which is same as the max size of a bytearray...
ManInMoon
+1  A: 

I understand your problem but why are your trying to allocate such a big array ?? I have no doubt that even if you succeed in solving this technical issue your architecture is wrong and the program your are trying to develop won't eventually work.

Gilad
Gilad, I need to allocate a big array to be able to process it in-memory. Speed is the critical issue to me. Plus I have many threads all processing the same data at once. I understand your concern BUT the archtecture works already with data up to the arbitary limit that is the maximum size of a bytearray. If I could remove this limit (or fool c#) then I see no reason why it should not continue to work.
ManInMoon
+1  A: 

Agree. Anyway you have limit of array size itself.

If you really need to operate huge arrays in a stream, write your custom memory stream class.

Dmitry Karpezo
Yes _ I have considered that - but that gives me a separate issue when trying to read across bytearray boundaries which wrting a custom stream class would involve - I have another question open on that one.
ManInMoon
+4  A: 

I do not believe .NET provides this, but it should be fairly easy to implement your own implementation of System.IO.Stream, that seamlessly switches backing array. Here are the (untested) basics:

public class MultiArrayMemoryStream: System.IO.Stream
{
    byte[][] _arrays;
    long _position;
    int _arrayNumber;
    int _posInArray;

    public MultiArrayMemoryStream(byte[][] arrays){
        _arrays = arrays;
        _position = 0;
        _arrayNumber = 0;
        _posInArray = 0;
    }

    public override int Read(byte[] buffer, int offset, int count){
        int read = 0;
        while(read<count){
            if(_arrayNumber>=_arrays.Length){
                return read;
            }
            if(count-read <= _arrays[_arrayNumber].Length - _posInArray){
                Buffer.BlockCopy(_arrays[_arrayNumber], _posInArray, buffer, offset+read, count-read);
                _posInArray+=count-read;
                            _position+=count-read;
                read=count;
            }else{
                Buffer.BlockCopy(_arrays[_arrayNumber], _posInArray, buffer, offset+read, _arrays[_arrayNumber].Length - _posInArray);
                read+=_arrays[_arrayNumber].Length - _posInArray;
                            _position+=_arrays[_arrayNumber].Length - _posInArray;
                _arrayNumber++;
                _posInArray=0;
            }
        }
        return count;
    }

    public override long Length{
        get {
            long res = 0;
            for(int i=0;i<_arrays.Length;i++){
                res+=_arrays[i].Length;
            }
            return res;
        }
    }

    public override long Position{
        get { return _position; }
        set { throw new NotSupportedException(); }
    }

    public override bool CanRead{
        get { return true; }
    }

    public override bool CanSeek{
        get { return false; }
    }

    public override bool CanWrite{
        get { return false; }
    }

    public override void Flush(){
    }

    public override void Seek(long offset, SeekOrigin origin){
        throw new NotSupportedException();
    }

    public override void SetLength(long value){
        throw new NotSupportedException();
    }

    public override void Write(byte[] buffer, int offset, int count){
        throw new NotSupportedException();
    }       
}

Another way to workaround the size-limitation of 2^31 bytes is UnmanagedMemoryStream which implements System.IO.Stream on top of an unmanaged memory buffer (which might be as large as the OS supports). Something like this might work (untested):

var fileStream = new FileStream("data", 
  FileMode.Open, 
  FileAccess.Read, 
  FileShare.Read, 
  16 * 1024, 
  FileOptions.SequentialScan);
long length = fileStream.Length;
IntPtr buffer = Marshal.AllocHGlobal(new IntPtr(length));
var memoryStream = new UnmanagedMemoryStream((byte*) buffer.ToPointer(), length, length, FileAccess.ReadWrite);
fileStream.CopyTo(memoryStream);
memoryStream.Seek(0, SeekOrigin.Begin);
// work with the UnmanagedMemoryStream
Marshal.FreeHGlobal(buffer);
Rasmus Faber
Rasmus - I hadn't heard of that - I will look it up - thank you
ManInMoon
Rasmus - that looks interesting.
ManInMoon
Could you guide me as to how best to load a bytestream from disk into UnmanagedMemoryStream in the example?
ManInMoon
@ManInMoon: Try this.
Rasmus Faber
Thanks Rasmas - I am working through what you have given me here
ManInMoon
Rasmus - I can't make this work. There's practically no doc or examples of using SafeBuffer and I get a "cannot create an instance of ... SafeBuffer".
ManInMoon
Any further clues? Appreciate your help with this.
ManInMoon
@ManInMoon: I did warn that it was untested ;-) Apparently SafeBuffer is abstract, so you cannot use that directly. Instead just allocate the memory directly as seen in the edited answer.
Rasmus Faber
Thanks Rasmus, but here we have a similar problem AllocHGlobal expects an integer. So the boundary is there again.
ManInMoon
@ManInMoon: AllocHGlobal has an overload which accepts an IntPtr. This can be used to allocate more memory than can be held in an integer. As I write in the updated example above: `Marshal.AllocHGlobal(new IntPtr(fileStream.Length))`.
Rasmus Faber
A: 

I think you can use a linear structure instead of a 2D structure using the following approach.

Instead of having byte[int.MaxValue][10] you can have byte[int.MaxValue*10]. You would address the item at [4,5] as int.MaxValue*(4-1)+(5-1). (a general formula would be (i-1)*number of columns+(j-1).

Of course you could use the other convention.

DaeMoohn
the reason why he's using the 2D structure is that he's going past the size limit of a single byte array.
Dave
good point, int.MaxValue!
DaeMoohn
A: 

If I understand your question correctly, you've got a massive file that you want to read into memory and then process. But you can't do this because the amount of data in the file exceeds that of any single-dimensional array.

You mentioned that speed is important, and that you have multiple threads running in parallel to process the data as quickly as possible. If you're going to have to partition the data for each thread anyway, why not base the number of threads on the number of byte[int.MaxValue] buffers required to cover everything?

Dave
Sorry, should have made that clear. Each of my threads runs over the whole data.
ManInMoon
I see. So you're doing something like applying multiple filters on the data, and not using threads to process one set of data more quickly? Just trying to see if there's another approach for you to use to get around this memory limitation.
Dave
A: 

IF you are using Framework 4.0, you have the option of working with a MemoryMappedFile. Memory mapped files can be backed by a physical file, or by the windows swap file. Memory mapped files act like an in memory stream, transparently swapping data to/from the backing storage if and when required.

If you are not using Framework 4.0, you can still use this option, but you will need to either write your own or find an exsiting wrapper. I expect there are plenty on CodeProject.

Chris Taylor
Thanks Chris. I tried that option but MemoryMappedFiles are very slow.
ManInMoon
@ManInMoon, that is unfortunate. The performance hit is probably because of the transition from User space to Kernel space, since MemoryMappedFiles are Kernel objects.
Chris Taylor
@ManInMoon, what is the source of the data? Is it being read from a file into memory?
Chris Taylor
yes. as a bytestream that I then process serially
ManInMoon