views:

494

answers:

2

In my job we had a problem with OutOfMemoryExpections. I've written simple piece of code to mimic some behavior, and I've ended up with the following mystery. Look at this simple code which blows up when it runs out of memory.

class Program
{
    private static void Main()
    {
        List<byte[]> list = new List<byte[]>(200000);
        int iter = 0;

        try
        {
            for (;;iter++)
            {
                list.Add(new byte[10000]);
            }
        }
        catch (OutOfMemoryException)
        {
            Console.WriteLine("Iterations: " + iter);
        }
    }
}

On my machine it ended up with

Iterations: 148008

Then I added GC.Collect call to the loop after each thousand iterations:

            //...
            for (;;iter++)
            {
                list.Add(new byte[10000]);

                if (iter % 1000 == 0)
                    GC.Collect();
            }
            //...

And surprise:

Iterations: 172048

When I called GC.Collect after each 10 iterations, I even got 193716 cycles. There are two strange things:

1) How manual call to GC.Collect can have such a severe impact (up to 30% more allocated)?

2) What the hell can GC collect, when there're no "lost" references (I've even preset the List's capacity)?

+10  A: 

A part of the garbage collection process is the compacting phase. During this phase, blocks of allocated memory are moved around to reduce fragementation. When memory is allocated, it isn't always allocated right after the last chunk of allocated memory left off. So you are able to squeeze a bit more in because the garbage collector is making more room by making better use of the available space.

I am trying to run some tests, but my machine can't handle them. Give this a try, it will tell the GC to pin down the objects in memory so they aren't moved around

byte[] b = new byte[10000];
GCHandle.Alloc(b, GCHandleType.Pinned);
list.Add(b);

As for your comment, when the GC moves things around, it isn't wiping anything out, it is just making better use of all memory space. Lets try and over simplify this. When you allocate your byte array the first time, lets say it gets inserted in memory from spot 0 to 10000. The next time you allocate the byte array, it isn't guarenteed to start at 10001, it may start at 10500. So now you have 499 bytes that aren't being used, and won't be used by your application. So when the GC does compacting, it will move the 10500 array to 10001 to be able to use that extra 499 bytes. And again, this is way over simplified.

Bob
That would make sense, but 1) I still can't see any objects beeing wiped out (alright, List.Add may add some noise, but quick check with resharper shows that it doesn't); 2) when so much memory is allocated, GC should be invoked many times by framework and should do the same.
Elephantik
That's also what I thought (see my comment on the question). However, what doesn't really make sense is that the GC should be invoked in the case of insufficient memory, thus compacting the memory at that time.However, this may be different since the GC somehow also allocates memory in blocks from the OS, thus not having one large memory block, but rater a series of memory blocks to deal with. Calling GC.COllect may reorganize the blocks more, so that less lost space (unused memory to small to be used for this allocation at the end of a OS block) is present.
Lucero
I see your point but as long as there are no dead objects, I can't see a reason for such intentional fragmentation of memory. One of the benefits of GC should be to benefit from nonfragmented free memory, so newly created objects don't have to look for free spaces. And - as mentioned before - automatic call to GC should do the same.
Elephantik
What is so nonsensical about my answer? Where is your better answer?
Bob
As I've pointed out... 1) Memory fragmentation should occur only when something is garbage collected (I may be wrong, but I believe I'm not) and 2) "normal" GC should address that. So there shouldn't be any difference if I call it manually or not. But there's (IMO big) difference.
Elephantik
1) Memory fragementation can occur even if the GC isn't running. As I mentioned, just because you ask for some memory, that does not mean it comes right out the next contiguous block. 2) The garbage collector is not always predictable. Just because it should do something doesn't mean it will. Also it should be noted the GC was made to handle real world situations better than others. This isn't a typical real world situation, so you are going to see variations.
Bob
+4  A: 

Depending on the CLR you're using, there may be some Large Object Heap issues involved.

Have a look at this article, which explains the issues with large block allocations (and the list with 200000 items is a large block for sure, the other may or may not be, some arrays seem to be put into LOH when they reach 8k, others after 85k).

http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/

Lucero
Good point, LOH could be involved, but the big list keeps sitting there for all the time, so LOH shouldn't be fragmented.
Elephantik
You test Lucero's point by reducing the small array. Although I only know of that 85000 limit.
Henk Holterman
I did a test to insert to an array of smaller arrays to avoid LOH, and the behaviour is still the same.
Elephantik