ansaurus

Question

How to alloc more than MaxInteger bytes of memory c#

Answer 1

A:

This is not possible from managed code without a pinvoke call and for good reason. Allocating that much memory is usually a sign of a bad solution that needs revisiting.

Can you tell us why you think you need this much memory?

JaredPar 2010-09-06 16:28:31

He's trying to read a file that doesn't fit in memory.

Hans Passant 2010-09-06 16:32:48

You should still not do it this way, the file should be read in portions.

Lasse V. Karlsen 2010-09-06 16:35:03

It's a long story!!! If you have the inclination - please have a look at my other questions - they are all on this same subject. You will find I detail the reason there. Many thanks

ManInMoon 2010-09-06 16:35:36

JaredPar, could you point me at the Pinvoke call you were thinking of please

ManInMoon 2010-09-06 16:39:29

Answer 2

+5 A:

That's not possible on current mainstream hardware. Memory buffers are restricted to 2 gigabytes, even on 64-bit machines. Indexed addressing of the buffer is still done with a 32-bit signed offset. It is technically possible to generate machine code that can index more, using a register to store the offset, but that's expensive and slows down all array indexing, even for the ones that aren't larger than 2 GB.

Furthermore, you can't get a buffer larger than about 650MB out of the address space available to a 32-bit process. There aren't enough contiguous memory pages available because virtual memory contains both code and data at various addresses.

Companies like IBM and Sun sell hardware that can do this.

Hans Passant 2010-09-06 16:31:39

Are you sure about that? I don't have a machine with more than 2 GB nearby at the moment, so I can't test it, but everything I can find points to VirtualAlloc() being able to allocate more than 2 GB. As for actually addressing the buffer, that is a separate problem, but if it can be done for memory-mapped-files, so it can surely be done for normal memory, too.

Rasmus Faber 2010-09-06 16:54:10

Interesting... Just read the Intel manuals, and indeed the displacement and immediate operand both have a maximum of 32 bits, even on x64.

wj32 2010-09-07 09:25:39

Anyway around it?

ManInMoon 2010-09-07 12:56:45

Link: http://blogs.msdn.com/b/joshwil/archive/2005/08/10/450202.aspx

Hans Passant 2010-09-07 13:12:41

@Hans Passant: That link actually contradicts you. Quote: "Use native allocations. You can always P/Invoke to NT’s native heap and allocate memory which you can then use unsafe code to access. [...] allocating an 8GB block [...]."

Rasmus Faber 2010-09-07 15:15:30

@Rasmus: this isn't about *allocating* it, this is about *indexing* the array after you got it. The 'technically possible' clause in my answer.

Hans Passant 2010-09-07 15:26:50

@Hans: So you iterate it using pointer increment instead of index increment. Big deal. Most optimizing compilers will do this transformation for you anyway.

Ben Voigt 2010-09-08 12:58:58

This is an array, not a list.

Hans Passant 2010-09-08 13:11:58

Answer 3

A:

Use Marshal.AllocHGlobal(IntPtr). This overload treats the value of the IntPtr as the amount of memory to allocate and IntPtr can hold a 64 bit value.

Rasmus Faber 2010-09-06 16:33:57

Thanks Rasmus - see Hans' comment below - pity!

ManInMoon 2010-09-06 16:37:46

Answer 4

+2 A:

Chris Taylor 2010-09-06 16:48:27

Thanks Chris, but I effectively do that know but by reading into a 2D bytearray - therefore it's still all in mem. And process each bytearray separately. But it is very cumbersome and I have had to do some circus tricks to make it work. Given that I have 32G in this server - I am a bit miffed I can't use it directly...

ManInMoon 2010-09-06 16:57:58

Hi Chris, thanks for code. I had to "answer" to show you my test code. Would you mind reading that please.

ManInMoon 2010-09-07 09:42:11

Answer 5

A:

I don't know the answer, but I would suggest that you should try another way to do what you are trying to achieve. IMHO, You should consider implementing some kind of data structure to handle your data instead of allocating huge buffer. Of course it is more complicated to do so, but sometimes it is inevitable when you are trying to do non-trivial data processing.

tia 2010-09-06 17:06:41

Answer 6

A:

Hi Chris,

I really appreciate the effort you put into this answer.

I changed the platform to x64 then I ran the code below.

myp appears to have the right length: about 3.0G. But stubbornly "buffer" maxes out at 2.1G

Any idea why?

      var fileStream = new FileStream("C:\\big.BC2",
          FileMode.Open,
          FileAccess.Read,
          FileShare.Read,
          16 * 1024,
          FileOptions.SequentialScan);
        Int64 length = fileStream.Length;
        Console.WriteLine(length);
        Console.WriteLine(Int64.MaxValue);
        IntPtr myp = new IntPtr(length);
        //IntPtr buffer = Marshal.AllocHGlobal(myp);
        IntPtr buffer = VirtualAllocEx(
            Process.GetCurrentProcess().Handle,
            IntPtr.Zero,
            new IntPtr(length),
            AllocationType.Commit | AllocationType.Reserve,
            MemoryProtection.ReadWrite); 
        unsafe
        {
            byte* pBytes = (byte*)myp.ToPointer(); 
            var memoryStream = new UnmanagedMemoryStream(pBytes, (long)length, (long)length, FileAccess.ReadWrite);
            fileStream.CopyTo(memoryStream);

ManInMoon 2010-09-07 09:13:30

@ManInMoon, when you say maxes out, does the VirtualAllocEx fail, or are you not able to add more than 2.1GB to the buffer?

Chris Taylor 2010-09-07 11:13:24

@ManInMoon: When you want to provide more information, you should edit the original question instead of adding it in an "answer".

Rasmus Faber 2010-09-07 11:21:35

@ManInMoon, I just did a quick test and was able to load a file of 2.5GB into memory.

Chris Taylor 2010-09-07 12:10:06

@ManInMoon, Another test I just loaded a 4GB file into memory. I Am using Window 7 64 Bit .NET Framework 4.0.

Chris Taylor 2010-09-07 12:18:04

The code doesn't fail or give an error - but when I look at length of buffer - it is less than what I set it too...

ManInMoon 2010-09-07 12:44:08

Chris, when you say you loaded 4G file into memory - are you actually achieving that? In example you sent me, u are only loading small amount of text. U are creating a big pointer from which to create a buffer, BUT I wonder if you checked afterwards that buffer actually was that big? Sorry to doubt - it's just thaat as I get no error - it is not obvious that it didn't work

ManInMoon 2010-09-07 12:59:15

@ManInMoon: What do you mean by "length of buffer"? `fileStream.Length` ? `buffer.Size`? (I hope not...) `memoryStream.Length` ? `RegionSize` from the result of `VirtualQueryEx` ? How many bytes you can actually read from `memoryStream` ? You have to be more precise.

Rasmus Faber 2010-09-07 13:05:33

ManInMoon 2010-09-07 13:29:47

ManInMoon 2010-09-07 13:31:42

@ManInMoon: Great you are making progress. Out of curiosity: is it only the `VirtualAllocEx`-call that works or could you use the simpler `Marshal.AllocHGlobal` instead?

Rasmus Faber 2010-09-07 13:37:49

Simpler version appears to work too... BUT I have yet to validate that it's not just rubbish in there

ManInMoon 2010-09-07 13:59:52

@ManInMoon, glad to see you are making progress. Just to clarify, when I tested I read a complete file and wrote out a new file from the data in memory and then used 'FC.EXE' to confirm that the complete roundtrip worked as expected. I am certain the GlobalAlloc will also work since ultimately it uses HeapAlloc, I just used VirtualAllocEx out of habbit.

Chris Taylor 2010-09-07 21:02:56

Yes that wokrs and can load a decent size file. Now I have a second issue. I need to read through my data in parallel in two threads. In past I have created a BinaryReader(MemoryStream(bytearray)). Which give me 2 MemoryStream pointing at same underlying data and I can read them inpedantly. However, with this version - we creater the UnamangedMemoryStream directly. If I try to wrap it with two BinaryReaders - they are not independant as the position is kept by the single memorystream. How do I create a second binaryreader that can read the same memorystream independantly?

ManInMoon 2010-09-08 08:27:36

@ManInMoon: Just create a second UnmanagedMemoryStreams using the original buffer.

Rasmus Faber 2010-09-08 08:44:25

But the FileStream.CopyTo(ms) is AFTER the UnmanagedMemoryStream declaration... So would I not be doubling the data in memory?

ManInMoon 2010-09-08 08:49:35

@ManInMoon: If you create the second UnmananagedMemoryStream on the _same_ memory that you copied the file into, you get two views of the data without doubling the memory-use. See my second answer.

Rasmus Faber 2010-09-08 09:20:07

@ManInMoon, as Rasmus says, you can create a second UnmanagedMemoryStream, of course you should probably make these ReadOnly streams. Since this is different from the original quesion it might help to post a new question specific to the new problem.

Chris Taylor 2010-09-08 09:32:14

Yes. I rearranged and it works now - thank you very. It's slower than my old version which was VS 2008 and .net3. This is VS 2010 and .net4 But there's no reason why it should be I suppose?

ManInMoon 2010-09-08 09:57:30

Chris/Rasmus. This is working perfectly. Many thanks for all your time and input on this.

ManInMoon 2010-09-08 10:33:16

@ManInMoon, wow that came a long way. And I am sure we all learned a few things along the way, I know I did. Let's hope that file does not get so big that no amount of memory will help :)

Chris Taylor 2010-09-08 11:10:08

Answer 7

A:

From a comment:

How do I create a second binaryreader that can read the same memorystream independantly?

var fileStream = new FileStream("C:\\big.BC2",
      FileMode.Open,
      FileAccess.Read,
      FileShare.Read,
      16 * 1024,
      FileOptions.SequentialScan);
    Int64 length = fileStream.Length;
    IntPtr buffer = Marshal.AllocHGlobal(length);
    unsafe
    {
        byte* pBytes = (byte*)myp.ToPointer(); 
        var memoryStream = new UnmanagedMemoryStream(pBytes, (long)length, (long)length, FileAccess.ReadWrite);
        var binaryReader = new BinaryReader(memoryStream);
        fileStream.CopyTo(memoryStream);
        memoryStream.Seek(0, SeekOrigin.Begin);
        // Create a second UnmanagedMemoryStream on the _same_ memory buffer
        var memoryStream2 = new UnmanagedMemoryStream(pBytes, (long)length, (long)length, FileAccess.Read);
        var binaryReader2 = new BinaryReader(memoryStream);
     }

Rasmus Faber 2010-09-08 09:18:44

Answer 8

A:

If you can't make it work the way you want it to directly, create a class to provide the type of behaviour you want. So, to use big arrays:

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace BigBuffer
{
  class Storage
  {
    public Storage (string filename)
    {
      m_buffers = new SortedDictionary<int, byte []> ();
      m_file = new FileStream (filename, FileMode.Open, FileAccess.Read, FileShare.Read);
    }

    public byte [] GetBuffer (long address)
    {
      int
        key = GetPageIndex (address);

      byte []
        buffer;

      if (!m_buffers.TryGetValue (key, out buffer))
      {
        System.Diagnostics.Trace.WriteLine ("Allocating a new array at " + key);
        buffer = new byte [1 << 24];
        m_buffers [key] = buffer;

        m_file.Seek (address, SeekOrigin.Begin);
        m_file.Read (buffer, 0, buffer.Length);
      }

      return buffer;
    }

    public void FillBuffer (byte [] destination_buffer, int offset, int count, long position)
    {
      do
      {
        byte []
          source_buffer = GetBuffer (position);

        int
          start = GetPageOffset (position),
          length = Math.Min (count, (1 << 24) - start);

        Array.Copy (source_buffer, start, destination_buffer, offset, length);

        position += length;
        offset += length;
        count -= length;
      } while (count > 0);
    }

    public int GetPageIndex (long address)
    {
      return (int) (address >> 24);
    }

    public int GetPageOffset (long address)
    {
      return (int) (address & ((1 << 24) - 1));
    }

    public long Length
    {
      get { return m_file.Length; }
    }

    public int PageSize
    {
      get { return 1 << 24; }
    }

    FileStream
      m_file;

    SortedDictionary<int, byte []>
      m_buffers;
  }

  class BigStream : Stream
  {
    public BigStream (Storage source)
    {
      m_source = source;
      m_position = 0;
    }

    public override bool CanRead
    {
      get { return true; }
    }

    public override bool CanSeek
    {
      get { return true; }
    }

    public override bool CanTimeout
    {
      get { return false; }
    }

    public override bool CanWrite
    {
      get { return false; }
    }

    public override long Length
    {
      get { return m_source.Length; }
    }

    public override long Position
    {
      get { return m_position; }
      set { m_position = value; }
    }

    public override void Flush ()
    {
    }

    public override long Seek (long offset, SeekOrigin origin)
    {
      switch (origin)
      {
      case SeekOrigin.Begin:
        m_position = offset;
        break;

      case SeekOrigin.Current:
        m_position += offset;
        break;

      case SeekOrigin.End:
        m_position = Length + offset;
        break;
      }

      return m_position;
    }

    public override void SetLength (long value)
    {
    }

    public override int Read (byte [] buffer, int offset, int count)
    {
      int
        bytes_read = (int) (m_position + count > Length ? Length - m_position : count);

      m_source.FillBuffer (buffer, offset, bytes_read, m_position);

      m_position += bytes_read;
      return bytes_read;
    }

    public override void  Write(byte[] buffer, int offset, int count)
    {
    }

    Storage
      m_source;

    long
      m_position;
  }

  class IntBigArray
  {
    public IntBigArray (Storage storage)
    {
      m_storage = storage;
      m_current_page = -1;
    }

    public int this [long index]
    {
      get
      {
        int
          value = 0;

        index <<= 2;

        for (int offset = 0 ; offset < 32 ; offset += 8, ++index)
        {
          int
            page = m_storage.GetPageIndex (index);

          if (page != m_current_page)
          {
            m_current_page = page;
            m_array = m_storage.GetBuffer (m_current_page);
          }

          value |= (int) m_array [m_storage.GetPageOffset (index)] << offset;
        }

        return value;
      }
    }

    Storage
      m_storage;

    int
      m_current_page;

    byte []
      m_array;
  }

  class Program
  {
    static void Main (string [] args)
    {
      Storage
        storage = new Storage (@"<some file>");

      BigStream
        stream = new BigStream (storage);

      StreamReader
        reader = new StreamReader (stream);

      string
        line = reader.ReadLine ();

      IntBigArray
        array = new IntBigArray (storage);

      int
        value = array [0];

      BinaryReader
        binary = new BinaryReader (stream);

      binary.BaseStream.Seek (0, SeekOrigin.Begin);

      int
        another_value = binary.ReadInt32 ();
    }
  }
}

I split the problem into three classes:

Storage - where the actual data is stored, uses a paged system
BigStream - a stream class that uses the Storage class for its data source
IntBigArray - a wrapper around the Storage type that provides an int array interface

The above can be improved significantly but it should give you ideas about how to solve your problems.

Skizz 2010-09-08 12:55:32

Thanks Skizz - appreciate you sharing this

ManInMoon 2010-09-08 13:18:03

ansaurus

tags:

views:

answers:

How to alloc more than MaxInteger bytes of memory c#

related questions