views:

587

answers:

7

How can I reduce the number of loaded dlls When debugging in Visual C# 2008 Express Edition?

When running a visual C# project in the debugger I get an OutOfMemoryException due to fragmentation of 2GB virtual address space and we assume that the loaded dlls might be the reason for the fragmentation.

Brian Rasmussen, you made my day! :)

His proposal of "disabling the visual studio hosting process" solved the problem.


(for more information see history of question-development below)








Hi, I need two big int-arrays to be loaded in memory with ~120 million elements (~470MB) each, and both in one Visual C# project.

When I'm trying to instantiate the 2nd Array I get an OutOfMemoryException.

I do have enough total free memory and after doing a web-search I thought my problem is that there aren't big enough contiguous free memory blocks on my system. BUT! - when I'm instantiating only one of the arrays in one Visual C# instance and then open another Visual C# instance, the 2nd instance can instantiate an array of 470MB. (Edit for clarification: In the paragraph above I meant running it in the debugger of Visual C#)

And the task-manager shows the corresponding memory usage-increase just as you would expect it. So not enough contiguous memory blocks on the whole system isn't the problem. Then I tried running a compiled executable that instantiates both arrays which works also (memory usage 1GB)

Summary:

OutOfMemoryException in Visual C# using two big int arrays, but running the compiled exe works (mem usage 1GB) and two separate Visual C# instances are able to find two big enough contiguous memory blocks for my big arrays, but I need one Visual C# instance to be able to provide the memory.


Update:

First of all special thanks to nobugz and Brian Rasmussen, I think they are spot on with their prediction that "the Fragmentation of 2GB virtual address space of the process" is the problem.

Following their suggestions I used VMMap and listdlls for my short amateur-analysis and I get:
* 21 dlls listed for the "standalone"-exe. (the one that works and uses 1GB of memory.)
* 58 dlls listed for vshost.exe-version. (the version which is run when debugging and that throws the exception and only uses 500MB)

VMMap showed me the biggest free memory blocks for the debugger version to be 262,175,167,155,108MBs.
So VMMap says that there is no contiguous 500MB block and according to the info about free blocks I added ~9 smaller int-arrays which added up to more than 1,2GB memory usage and actually did work.
So from that I would say that we can call "fragmentation of 2GB virtual address space" guilty.

From the listdll-output I created a small spreadsheet with hex-numbers converted to decimal to check free areas between dlls and I did find big free space for the standalone version inbetween (21) dlls but not for the vshost-debugger-version (58 dlls). I'm not claiming that there can't be anything else between and I'm not really sure if what I'm doing there makes sense but it seems consistent with VMMaps analysis and it seems as if the dlls alone already fragment the memory for the debugger-version.

So perhaps a solution would be if I would be able to reduce the number of dlls used by the debugger.
1. Is that possible? 2. If yes how would I do that?

+7  A: 

3rd update: You can reduce the number of loaded DLLs significantly by disabling the Visual Studio hosting process (project properties, debug). Doing so will still allow you to debug the application, but it will get rid of a lot of DLLs and a number of helper threads as well.

On a small test project the number of loaded DLLs went from 69 to 34 when I disabled the hosting process. I also got rid of 10+ threads. All in all a significant reduction in memory usage which should also help reduce heap fragmentation.

Additional info on the hosting process: http://msdn.microsoft.com/en-us/library/ms242202.aspx


The reason you can load the second array in a new application is that each process gets a full 2 GB virtual address space. I.e. the OS will swap pages to allow each process to address the total amount of memory. When you try to allocate both arrays in one process the runtime must be able to allocate two contiguous chunks of the desired size. What are you storing in the array? If you store objects, you need additional space for each of the objects.

Remember an application doesn't actually request physical memory. Instead each application is given an address space from which they can allocate virtual memory. The OS then maps the virtual memory to physical memory. It is a rather complex process (Russinovich spends 100+ pages on how Windows handle memory in his Windows Internal book). For more details on how Windows does this please see http://blogs.technet.com/markrussinovich/archive/2008/11/17/3155406.aspx

Update: I've been pondering this question for a while and it does sound a bit odd. When you run the application through Visual Studio, you may see additional modules loaded depending on your configuration. On my setup I get a number of different DLLs loaded during debug due to profilers and TypeMock (which essentially does its magic via the profiler hooks).

Depending on the size and load address of these they may prevent the runtime from allocating contiguous memory. Having said that, I am still a bit surprised that you get an OOM after allocating just two of those big arrays as their combined size is less than 1 GB.

You can look at the loaded DLLs using the listdlls tools from SysInternals. It will show you load addresses and size. Alternatively, you can use WinDbg. The lm command shows loaded modules. If you want size as well, you need to specify the v option for verbose output. WinDbg will also allow you to examine the .NET heaps, which may help you to pinpoint why memory cannot be allocated.

2nd Update: If you're on Windows XP, you can try to rebase some of the loaded DLLs to free up more contiguous space. Vista and Windows 7 uses ASLR, so I am not sure you'll benefit from rebasing on those platforms.

Brian Rasmussen
Just to add, that a process (on a 32bit windows OS) can only address up to 2GB of memory, even if more is installed.
Oded
It is just an int-array no data attached, it is basically a big lookup-table.Note that the executable for itself does run without problems using ~1GB but debugging in Visual C# leads to the exception while it doesn't even use anything close to the 2GB, just 0.5GB for the first array and gives an exception for the 2nd. So my problem is basically how to make Visual C# more "greedy".Thx for the blog link.
Isn't ASLR just for system dll's? If he's loading a lot of custom dll's, shouldn't they be able to be rebased?
Lasse V. Karlsen
@Lasse: It could be, I honestly don't know the details. All I'm saying is that it may not make any difference on Vista and forward.
Brian Rasmussen
Ok, I was hoping you knew :) We have a problem with dll's overlapping as well, not (yet) producing out of memory problems, but I have a hope that managing to rebase all the dll's (or at least many of them) will make the terminal server able to reuse one physical memory block for many clients, instead of rebasing it for each, which it seems to do.
Lasse V. Karlsen
+1  A: 

I had a similar issue once and what I ended up doing was using a list instead of an array. When creating the lists I set the capacity to the required sizes and I defined both lists BEFORE I tried adding values to them. I'm not sure if you can use lists instead of arrays but it might be something to consider. In the end I had to run the executable on a 64 bit OS, because when I added the items to the list the overall memory usage went above 2GB, but at least I wa able to run and debug locally with a reduced set of data.

TskTsk
Thx for the suggestion, but since my int-array is a big lookup table (all about speed), so unfortunately lists aren't suitable.
Besides, the `List<T>` class in .NET uses arrays internally, so wouldn't help much, unless you mean a different list-type (linked list?)
Lasse V. Karlsen
+9  A: 

You are battling virtual memory address space fragmentation. A process on the 32-bit version of Windows has 2 gigabytes of memory available. That memory is shared by code as well as data. Chunks of code are the CLR and the JIT compiler as well as the ngen-ed framework assemblies. Chunks of data are the various heaps used by .NET, including the loader heap (static variables) and the garbage collected heaps. These chunks are located at various addresses in the memory map. The free memory is available for you to allocate your arrays.

Problem is, a large array requires a contiguous chunk of memory. The "holes" in the address space, between chunks of code and data, are not large enough to allow you to allocate such large arrays. The first hole is typically between 450 and 550 Megabytes, that's why your first array allocation succeeded. The next available hole is a lot smaller. Too small to fit another big array, you'll get OOM even though you've got an easy gigabyte of free memory left.

You can look at the virtual memory layout of your process with the SysInternals' VMMap utility. Okay for diagnostics, but it isn't going to solve your problem. There's only one real fix, moving to a 64-bit version of Windows. Perhaps better: rethink your algorithm so it doesn't require such large arrays.

Hans Passant
Thx for the VMMap tip.
A: 

I have experience with two desktop applications and one moble application hitting out-of-memory limits. I understand the issues. I do not know your requirements, but I suggest moving your lookup arrays into SQL CE. Performance is good, you will be surprised, and SQL CE is in-process. With the last desktop application, I was able to reduce my memory footprint from 2.1GB to 720MB, which had the benefit of speeding up the application due to significantly reducing page faults. (Your problem is fragmentation of the AppDomain's memory, which you have no control over.)

Honestly, I do not think you will be satisfied with performance after squeezing these arrays into memory. Don't forget, excessive page faults has a significant impact on performance.

If you do go SqlServerCe, make sure to keep the connection open to improve performance. Also, single row lookups (scalar) may be slower than returning a result set.

If you really want to know what is going on with memory, use CLR Profiler. VMMap is not going to help. The OS does not allocate memory to your application. The Framework does by grabbing large chucks of OS memory for itself (caching the memory) then allocating, when needed, pieces of this memory to applications.

CLR Profiler for the .NET Framework 2.0 at http://www.microsoft.com/downloads/details.aspx?familyid=A362781C-3870-43BE-8926-862B40AA0CD0&amp;displaylang=en

AMissico
A: 

A question: Are all elements of your array occupied? If many of them contain some default value then maybe you could reduce memory consumption using an implementation of a sparse array that only allocates memory for the non-default values. Just a thought.

Andy Johnson
+2  A: 

This isn't an answer per se, but perhaps an alternative might work.

If the problem is indeed that you have fragmented memory, then perhaps one workaround would be to just use those holes, instead of trying to find a hole big enough for everything consecutively.

Here's a very simple BigArray class that doesn't add too much overhead (some overhead is introduced, especially in the constructor, in order to initialize the buckets).

The statistics for the array is:

  • Main executes in 404ms
  • static Program-constructor doesn't show up

The statistics for the class is:

  • Main took 473ms
  • static Program-constructor takes 837ms (initializing the buckets)

The class allocates a bunch of 8192-element arrays (13 bit indexes), which on 64-bit for reference types will fall below the LOB limit. If you're only going to use this for Int32, you can probably up this to 14 and probably even make it nongeneric, although I doubt it will improve performance much.

In the other direction, if you're afraid you're going to have a lot of holes smaller than the 8192-element arrays (64KB on 64-bit or 32KB on 32-bit), you can just reduce the bit-size for the bucket indexes through its constant. This will add more overhead to the constructor, and add more memory-overhead, since the outmost array will be bigger, but the performance should not be affected.

Here's the code:

using System;
using NUnit.Framework;

namespace ConsoleApplication5
{
    class Program
    {
        // static int[] a = new int[100 * 1024 * 1024];
        static BigArray<int> a = new BigArray<int>(100 * 1024 * 1024);

        static void Main(string[] args)
        {
            int l = a.Length;
            for (int index = 0; index < l; index++)
                a[index] = index;
            for (int index = 0; index < l; index++)
                if (a[index] != index)
                    throw new InvalidOperationException();
        }
    }

    [TestFixture]
    public class BigArrayTests
    {
        [Test]
        public void Constructor_ZeroLength_ThrowsArgumentOutOfRangeException()
        {
            Assert.Throws<ArgumentOutOfRangeException>(() =>
            {
                new BigArray<int>(0);
            });
        }

        [Test]
        public void Constructor_NegativeLength_ThrowsArgumentOutOfRangeException()
        {
            Assert.Throws<ArgumentOutOfRangeException>(() =>
            {
                new BigArray<int>(-1);
            });
        }

        [Test]
        public void Indexer_SetsAndRetrievesCorrectValues()
        {
            BigArray<int> array = new BigArray<int>(10001);
            for (int index = 0; index < array.Length; index++)
                array[index] = index;
            for (int index = 0; index < array.Length; index++)
                Assert.That(array[index], Is.EqualTo(index));
        }

        private const int PRIME_ARRAY_SIZE = 10007;

        [Test]
        public void Indexer_RetrieveElementJustPastEnd_ThrowsIndexOutOfRangeException()
        {
            BigArray<int> array = new BigArray<int>(PRIME_ARRAY_SIZE);
            Assert.Throws<IndexOutOfRangeException>(() =>
            {
                array[PRIME_ARRAY_SIZE] = 0;
            });
        }

        [Test]
        public void Indexer_RetrieveElementJustBeforeStart_ThrowsIndexOutOfRangeException()
        {
            BigArray<int> array = new BigArray<int>(PRIME_ARRAY_SIZE);
            Assert.Throws<IndexOutOfRangeException>(() =>
            {
                array[-1] = 0;
            });
        }

        [Test]
        public void Constructor_BoundarySizes_ProducesCorrectlySizedArrays()
        {
            for (int index = 1; index < 16384; index++)
            {
                BigArray<int> arr = new BigArray<int>(index);
                Assert.That(arr.Length, Is.EqualTo(index));

                arr[index - 1] = 42;
                Assert.That(arr[index - 1], Is.EqualTo(42));
                Assert.Throws<IndexOutOfRangeException>(() =>
                {
                    arr[index] = 42;
                });
            }
        }
    }

    public class BigArray<T>
    {
        const int BUCKET_INDEX_BITS = 13;
        const int BUCKET_SIZE = 1 << BUCKET_INDEX_BITS;
        const int BUCKET_INDEX_MASK = BUCKET_SIZE - 1;

        private readonly T[][] _Buckets;
        private readonly int _Length;

        public BigArray(int length)
        {
            if (length < 1)
                throw new ArgumentOutOfRangeException("length");

            _Length = length;
            int bucketCount = length >> BUCKET_INDEX_BITS;
            bool lastBucketIsFull = true;
            if ((length & BUCKET_INDEX_MASK) != 0)
            {
                bucketCount++;
                lastBucketIsFull = false;
            }

            _Buckets = new T[bucketCount][];
            for (int index = 0; index < bucketCount; index++)
            {
                if (index < bucketCount - 1 || lastBucketIsFull)
                    _Buckets[index] = new T[BUCKET_SIZE];
                else
                    _Buckets[index] = new T[(length & BUCKET_INDEX_MASK)];
            }
        }

        public int Length
        {
            get
            {
                return _Length;
            }
        }

        public T this[int index]
        {
            get
            {
                return _Buckets[index >> BUCKET_INDEX_BITS][index & BUCKET_INDEX_MASK];
            }

            set
            {
                _Buckets[index >> BUCKET_INDEX_BITS][index & BUCKET_INDEX_MASK] = value;
            }
        }
    }
}
Lasse V. Karlsen
A: 

Each 32bit process has a 2GB address space (unless you ask the user to add /3GB in boot options), so if you can accept some performance drop-off, you can start a new process to get 2GB more in address space - well, a little less than that. The new process would be still fragmented with all the CLR dlls plus all the Win32 DLLs they use, so you can get rid of all address space fragmentation caused by CLR dlls by writing the new process in a native language e.g. C++. You can even move some of your calculation to the new process so you get more address space in your main app and less chatty with your main process.

You can communicate between your processes using any of the interprocess communication methods. You can find many IPC samples in the All-In-One Code Framework.

Sheng Jiang 蒋晟