views:

170

answers:

8

Hi,

I wish to allocate more than MaxInteger bytes of memory.

Marshall.AllocHGlobal() expects an integer - so I cannot use this. Does anyone know of another way?

A: 

This is not possible from managed code without a pinvoke call and for good reason. Allocating that much memory is usually a sign of a bad solution that needs revisiting.

Can you tell us why you think you need this much memory?

JaredPar
He's trying to read a file that doesn't fit in memory.
Hans Passant
You should still not do it this way, the file should be read in portions.
Lasse V. Karlsen
It's a long story!!! If you have the inclination - please have a look at my other questions - they are all on this same subject. You will find I detail the reason there. Many thanks
ManInMoon
JaredPar, could you point me at the Pinvoke call you were thinking of please
ManInMoon
+5  A: 

That's not possible on current mainstream hardware. Memory buffers are restricted to 2 gigabytes, even on 64-bit machines. Indexed addressing of the buffer is still done with a 32-bit signed offset. It is technically possible to generate machine code that can index more, using a register to store the offset, but that's expensive and slows down all array indexing, even for the ones that aren't larger than 2 GB.

Furthermore, you can't get a buffer larger than about 650MB out of the address space available to a 32-bit process. There aren't enough contiguous memory pages available because virtual memory contains both code and data at various addresses.

Companies like IBM and Sun sell hardware that can do this.

Hans Passant
Are you sure about that? I don't have a machine with more than 2 GB nearby at the moment, so I can't test it, but everything I can find points to VirtualAlloc() being able to allocate more than 2 GB. As for actually addressing the buffer, that is a separate problem, but if it can be done for memory-mapped-files, so it can surely be done for normal memory, too.
Rasmus Faber
Interesting... Just read the Intel manuals, and indeed the displacement and immediate operand both have a maximum of 32 bits, even on x64.
wj32
Anyway around it?
ManInMoon
Link: http://blogs.msdn.com/b/joshwil/archive/2005/08/10/450202.aspx
Hans Passant
@Hans Passant: That link actually contradicts you. Quote: "Use native allocations. You can always P/Invoke to NT’s native heap and allocate memory which you can then use unsafe code to access. [...] allocating an 8GB block [...]."
Rasmus Faber
@Rasmus: this isn't about *allocating* it, this is about *indexing* the array after you got it. The 'technically possible' clause in my answer.
Hans Passant
@Hans: So you iterate it using pointer increment instead of index increment. Big deal. Most optimizing compilers will do this transformation for you anyway.
Ben Voigt
This is an array, not a list.
Hans Passant
A: 

Use Marshal.AllocHGlobal(IntPtr). This overload treats the value of the IntPtr as the amount of memory to allocate and IntPtr can hold a 64 bit value.

Rasmus Faber
Thanks Rasmus - see Hans' comment below - pity!
ManInMoon
+2  A: 
Chris Taylor
Thanks Chris, but I effectively do that know but by reading into a 2D bytearray - therefore it's still all in mem. And process each bytearray separately. But it is very cumbersome and I have had to do some circus tricks to make it work. Given that I have 32G in this server - I am a bit miffed I can't use it directly...
ManInMoon
Hi Chris, thanks for code. I had to "answer" to show you my test code. Would you mind reading that please.
ManInMoon
A: 

I don't know the answer, but I would suggest that you should try another way to do what you are trying to achieve. IMHO, You should consider implementing some kind of data structure to handle your data instead of allocating huge buffer. Of course it is more complicated to do so, but sometimes it is inevitable when you are trying to do non-trivial data processing.

tia
A: 

Hi Chris,

I really appreciate the effort you put into this answer.

I changed the platform to x64 then I ran the code below.

myp appears to have the right length: about 3.0G. But stubbornly "buffer" maxes out at 2.1G

Any idea why?

      var fileStream = new FileStream("C:\\big.BC2",
          FileMode.Open,
          FileAccess.Read,
          FileShare.Read,
          16 * 1024,
          FileOptions.SequentialScan);
        Int64 length = fileStream.Length;
        Console.WriteLine(length);
        Console.WriteLine(Int64.MaxValue);
        IntPtr myp = new IntPtr(length);
        //IntPtr buffer = Marshal.AllocHGlobal(myp);
        IntPtr buffer = VirtualAllocEx(
            Process.GetCurrentProcess().Handle,
            IntPtr.Zero,
            new IntPtr(length),
            AllocationType.Commit | AllocationType.Reserve,
            MemoryProtection.ReadWrite); 
        unsafe
        {
            byte* pBytes = (byte*)myp.ToPointer(); 
            var memoryStream = new UnmanagedMemoryStream(pBytes, (long)length, (long)length, FileAccess.ReadWrite);
            fileStream.CopyTo(memoryStream);
ManInMoon
@ManInMoon, when you say maxes out, does the VirtualAllocEx fail, or are you not able to add more than 2.1GB to the buffer?
Chris Taylor
@ManInMoon: When you want to provide more information, you should edit the original question instead of adding it in an "answer".
Rasmus Faber
@ManInMoon, I just did a quick test and was able to load a file of 2.5GB into memory.
Chris Taylor
@ManInMoon, Another test I just loaded a 4GB file into memory. I Am using Window 7 64 Bit .NET Framework 4.0.
Chris Taylor
The code doesn't fail or give an error - but when I look at length of buffer - it is less than what I set it too...
ManInMoon
Chris, when you say you loaded 4G file into memory - are you actually achieving that? In example you sent me, u are only loading small amount of text. U are creating a big pointer from which to create a buffer, BUT I wonder if you checked afterwards that buffer actually was that big? Sorry to doubt - it's just thaat as I get no error - it is not obvious that it didn't work
ManInMoon
@ManInMoon: What do you mean by "length of buffer"? `fileStream.Length` ? `buffer.Size`? (I hope not...) `memoryStream.Length` ? `RegionSize` from the result of `VirtualQueryEx` ? How many bytes you can actually read from `memoryStream` ? You have to be more precise.
Rasmus Faber
ManInMoon
ManInMoon
@ManInMoon: Great you are making progress. Out of curiosity: is it only the `VirtualAllocEx`-call that works or could you use the simpler `Marshal.AllocHGlobal` instead?
Rasmus Faber
Simpler version appears to work too... BUT I have yet to validate that it's not just rubbish in there
ManInMoon
@ManInMoon, glad to see you are making progress. Just to clarify, when I tested I read a complete file and wrote out a new file from the data in memory and then used 'FC.EXE' to confirm that the complete roundtrip worked as expected. I am certain the GlobalAlloc will also work since ultimately it uses HeapAlloc, I just used VirtualAllocEx out of habbit.
Chris Taylor
Yes that wokrs and can load a decent size file. Now I have a second issue. I need to read through my data in parallel in two threads. In past I have created a BinaryReader(MemoryStream(bytearray)). Which give me 2 MemoryStream pointing at same underlying data and I can read them inpedantly. However, with this version - we creater the UnamangedMemoryStream directly. If I try to wrap it with two BinaryReaders - they are not independant as the position is kept by the single memorystream. How do I create a second binaryreader that can read the same memorystream independantly?
ManInMoon
@ManInMoon: Just create a second UnmanagedMemoryStreams using the original buffer.
Rasmus Faber
But the FileStream.CopyTo(ms) is AFTER the UnmanagedMemoryStream declaration... So would I not be doubling the data in memory?
ManInMoon
@ManInMoon: If you create the second UnmananagedMemoryStream on the _same_ memory that you copied the file into, you get two views of the data without doubling the memory-use. See my second answer.
Rasmus Faber
@ManInMoon, as Rasmus says, you can create a second UnmanagedMemoryStream, of course you should probably make these ReadOnly streams. Since this is different from the original quesion it might help to post a new question specific to the new problem.
Chris Taylor
Yes. I rearranged and it works now - thank you very. It's slower than my old version which was VS 2008 and .net3. This is VS 2010 and .net4 But there's no reason why it should be I suppose?
ManInMoon
Chris/Rasmus. This is working perfectly. Many thanks for all your time and input on this.
ManInMoon
@ManInMoon, wow that came a long way. And I am sure we all learned a few things along the way, I know I did. Let's hope that file does not get so big that no amount of memory will help :)
Chris Taylor
A: 

From a comment:

How do I create a second binaryreader that can read the same memorystream independantly?

var fileStream = new FileStream("C:\\big.BC2",
      FileMode.Open,
      FileAccess.Read,
      FileShare.Read,
      16 * 1024,
      FileOptions.SequentialScan);
    Int64 length = fileStream.Length;
    IntPtr buffer = Marshal.AllocHGlobal(length);
    unsafe
    {
        byte* pBytes = (byte*)myp.ToPointer(); 
        var memoryStream = new UnmanagedMemoryStream(pBytes, (long)length, (long)length, FileAccess.ReadWrite);
        var binaryReader = new BinaryReader(memoryStream);
        fileStream.CopyTo(memoryStream);
        memoryStream.Seek(0, SeekOrigin.Begin);
        // Create a second UnmanagedMemoryStream on the _same_ memory buffer
        var memoryStream2 = new UnmanagedMemoryStream(pBytes, (long)length, (long)length, FileAccess.Read);
        var binaryReader2 = new BinaryReader(memoryStream);
     }
Rasmus Faber
A: 

If you can't make it work the way you want it to directly, create a class to provide the type of behaviour you want. So, to use big arrays:

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace BigBuffer
{
  class Storage
  {
    public Storage (string filename)
    {
      m_buffers = new SortedDictionary<int, byte []> ();
      m_file = new FileStream (filename, FileMode.Open, FileAccess.Read, FileShare.Read);
    }

    public byte [] GetBuffer (long address)
    {
      int
        key = GetPageIndex (address);

      byte []
        buffer;

      if (!m_buffers.TryGetValue (key, out buffer))
      {
        System.Diagnostics.Trace.WriteLine ("Allocating a new array at " + key);
        buffer = new byte [1 << 24];
        m_buffers [key] = buffer;

        m_file.Seek (address, SeekOrigin.Begin);
        m_file.Read (buffer, 0, buffer.Length);
      }

      return buffer;
    }

    public void FillBuffer (byte [] destination_buffer, int offset, int count, long position)
    {
      do
      {
        byte []
          source_buffer = GetBuffer (position);

        int
          start = GetPageOffset (position),
          length = Math.Min (count, (1 << 24) - start);

        Array.Copy (source_buffer, start, destination_buffer, offset, length);

        position += length;
        offset += length;
        count -= length;
      } while (count > 0);
    }

    public int GetPageIndex (long address)
    {
      return (int) (address >> 24);
    }

    public int GetPageOffset (long address)
    {
      return (int) (address & ((1 << 24) - 1));
    }

    public long Length
    {
      get { return m_file.Length; }
    }

    public int PageSize
    {
      get { return 1 << 24; }
    }

    FileStream
      m_file;

    SortedDictionary<int, byte []>
      m_buffers;
  }

  class BigStream : Stream
  {
    public BigStream (Storage source)
    {
      m_source = source;
      m_position = 0;
    }

    public override bool CanRead
    {
      get { return true; }
    }

    public override bool CanSeek
    {
      get { return true; }
    }

    public override bool CanTimeout
    {
      get { return false; }
    }

    public override bool CanWrite
    {
      get { return false; }
    }

    public override long Length
    {
      get { return m_source.Length; }
    }

    public override long Position
    {
      get { return m_position; }
      set { m_position = value; }
    }

    public override void Flush ()
    {
    }

    public override long Seek (long offset, SeekOrigin origin)
    {
      switch (origin)
      {
      case SeekOrigin.Begin:
        m_position = offset;
        break;

      case SeekOrigin.Current:
        m_position += offset;
        break;

      case SeekOrigin.End:
        m_position = Length + offset;
        break;
      }

      return m_position;
    }

    public override void SetLength (long value)
    {
    }

    public override int Read (byte [] buffer, int offset, int count)
    {
      int
        bytes_read = (int) (m_position + count > Length ? Length - m_position : count);

      m_source.FillBuffer (buffer, offset, bytes_read, m_position);

      m_position += bytes_read;
      return bytes_read;
    }

    public override void  Write(byte[] buffer, int offset, int count)
    {
    }

    Storage
      m_source;

    long
      m_position;
  }

  class IntBigArray
  {
    public IntBigArray (Storage storage)
    {
      m_storage = storage;
      m_current_page = -1;
    }

    public int this [long index]
    {
      get
      {
        int
          value = 0;

        index <<= 2;

        for (int offset = 0 ; offset < 32 ; offset += 8, ++index)
        {
          int
            page = m_storage.GetPageIndex (index);

          if (page != m_current_page)
          {
            m_current_page = page;
            m_array = m_storage.GetBuffer (m_current_page);
          }

          value |= (int) m_array [m_storage.GetPageOffset (index)] << offset;
        }

        return value;
      }
    }

    Storage
      m_storage;

    int
      m_current_page;

    byte []
      m_array;
  }

  class Program
  {
    static void Main (string [] args)
    {
      Storage
        storage = new Storage (@"<some file>");

      BigStream
        stream = new BigStream (storage);

      StreamReader
        reader = new StreamReader (stream);

      string
        line = reader.ReadLine ();

      IntBigArray
        array = new IntBigArray (storage);

      int
        value = array [0];

      BinaryReader
        binary = new BinaryReader (stream);

      binary.BaseStream.Seek (0, SeekOrigin.Begin);

      int
        another_value = binary.ReadInt32 ();
    }
  }
}

I split the problem into three classes:

  • Storage - where the actual data is stored, uses a paged system
  • BigStream - a stream class that uses the Storage class for its data source
  • IntBigArray - a wrapper around the Storage type that provides an int array interface

The above can be improved significantly but it should give you ideas about how to solve your problems.

Skizz
Thanks Skizz - appreciate you sharing this
ManInMoon