views:

227

answers:

3

I am playing around with the garbage collector in C# (or rather the CLR?) trying to better understand memory management in C#.

I made a small sample program that reads three larger files into a byte[] buffer. I wanted to see, if

  • I actually need to to anything in order to handle memory efficient
  • it has any impact when setting the byte[] to null after the end of the current iteration
  • and finally if it would help when forcing a garbage collection via GC.Collect()

Disclaimer: I measured memory consumption with windows task manager and rounded it. I tried several times, but overall it remained about the same.

Here is my simple sample program:

    static void Main(string[] args)
    {
        Loop();
    }

    private static void Loop()
    {
        var list = new List<string> 
        { 
            @"C:\Users\Public\Music\Sample Music\Amanda.wma",       // Size: 4.75 MB
            @"C:\Users\Public\Music\Sample Music\Despertar.wma",    // Size: 5.92 MB
            @"C:\Users\Public\Music\Sample Music\Distance.wma",     // Size: 6.31 MB
        };

        Console.WriteLine("before loop");
        Console.ReadLine();

        foreach (string pathname in list)
        {
            // ... code here ...

            Console.WriteLine("in loop");
            Console.ReadLine();
        }

        Console.WriteLine(GC.CollectionCount(1));
        Console.WriteLine("end loop");
        Console.ReadLine();
    }

For each test, I only changed the contents of the foreach loop. Then I ran the program, at each Console.ReadLine() I stopped and checked the memory usage of the process in windows task manager. I took notes of the used memory and then continued the program with return (I know about breakpoints ;) ). Just after the end of the loop, I wrote GC.CollectionCount(1) to the console in order to see how often the GC jumped in if at all.


Results


Test 1:

        foreach ( ... )
        {
            byte[] buffer = File.ReadAllBytes(pathname);

            Console.WriteLine ...
        }

Result (memory used):

before loop:   9.000 K 
1. iteration: 13.000 K
2. iteration: 19.000 K
3. iteration: 25.000 K
after loop:   25.000 K
GC.CollectionCount(1): 2


Test 2:

        foreach ( ... )
        {
            byte[] buffer = File.ReadAllBytes(pathname);
            buffer = null;

            Console.WriteLine ...
        }

Result (memory used):

before loop:   9.000 K 
1. iteration: 13.000 K
2. iteration: 14.000 K
3. iteration: 15.000 K
after loop:   15.000 K
GC.CollectionCount(1): 2


Test 3:

        foreach ( ... )
        {
            byte[] buffer = File.ReadAllBytes(pathname);
            buffer = null;
            GC.Collect();

            Console.WriteLine ...
        }

Result (memory used):

before loop:   9.000 K 
1. iteration:  8.500 K
2. iteration:  8.600 K
3. iteration:  8.600 K
after loop:    8.600 K
GC.CollectionCount(1): 3



What I dont understand:

  • In Test 1, the memory increases with each iteration. Therefore I guess that the memory is NOT freed at the end of the loop. But the GC still says it collected 2 times (GC.CollectionCount). How so?
  • In Test 2, it obviously helps that buffer is set to null. The memory is lower then in Test 2. But why does GC.CollectionCount output 2 and not 3? And why is the memory usage not as low as in Test 3?
  • Test 3 uses the least memory. I would say it is so because 1. the reference to the memory is removed (buffer is set to null) and therefore when the garbage collector is called via GC.Collect() it can free the memory. Seems pretty clear.

If anyone with more experience could shed some light on some of the points above, it would really help me. Pretty interesting topic imho.

A: 

The memory usage your viewing via the task manager is for the process. Remember the CLR manages memory on behalf of your application, so you will typically not see the usage of the GC heap reflected directly in the process memory usage.

Allocating and freeing memory is not free so obviously the CLR will try to optimize this to reduce the cost. Thus when objects are collected from the heap you may or may not see memory released to the OS as well.

Brian Rasmussen
+4  A: 

Looking at the fact you are reading in entire WMA files into an array, I'd say those array objects are being allocated in the Large Object Heap. This is a seperate heap that's managed in a more malloc-type way (because compacting garbage collection isn't efficient at dealing with large objects).

Space in the Large Object Heap is collected according to different rules and it doesn't count towards the main generation count and that'll be way you're not seeing a difference in the number of collections between tests 1 and 2 even though the memory is being re-used (all that's being collected there is the Array object, not the underlying bytes). In Test 3 you are forcing a collection each time round the loop - the Large Object Heap is being included in that so the memory useage of the process does not increase.

U62
+1 for noting the malloc heap.
whatnick
Interesting, I did not know about the large object heap. What exactly would a "malloc-type way" be? I never worked with C, so I dont know how malloc behaves.
Max
"all that's being collected there is the Array object, not the underlying bytes" - there's no separation between `Array` objects and their bytes; they're allocated as a single memory block as a form of optimization, i.e. the "underlying bytes" immediately follow `Array` vtable and other internal structures. Same goes for arrays.
Pavel Minaev
A: 

Give you a link that I feel may be useful to you.

http://msdn.microsoft.com/en-us/magazine/ee309515.aspx

-Joe Yu

Joe