views:

47

answers:

1

In our application, we are reading an XPS file using the System.IO.Packaging.Package class. When we read from a stream of a PackagePart, we can see from the Task Manager that the application's memory consumption rises. However, when the reading is done, the memory consumption doesn't fall back to what it was before reading from the stream.

To illustrate the problem, I wrote a simple code sample that you can use in a stand alone wpf application.

 public partial class Window1 : Window
 {
        public Window1()
        {
            InitializeComponent();

            _package = Package.Open(@"c:\test\1000pages.xps", FileMode.Open, FileAccess.ReadWrite, FileShare.None);

        }

        private void ReadPackage()
        {
            foreach (PackagePart part in _package.GetParts())
            {
                using (Stream partStream = part.GetStream())
                {
                    byte[] arr = new byte[partStream.Length];
                    partStream.Read(arr, 0, (int)partStream.Length);
                    partStream.Close();
                }
            }
        }

        Package _package;
        private void Button_Click(object sender, RoutedEventArgs e)
        {
            ReadPackage();      
        }
 }

The ReadPackage() method will read all the PackagePart objects' stream contents into a local array. In the sample, I used a 1000 page XPS document as the package source in order to easily see the memory consumption change of the application. On my machine, the stand alone app's memory consumption starts at 18MB then rises to 100MB after calling the method. Calling the method again can raise the memory consumption again but it can fall back to 100MB. However, it doesn't fall back to 18MB anymore.

Has anyone experienced this while using PackagePart? Or am I using it wrong? I think the internal implementation of PackagePart is caching the data that was read.

Thank you!

A: 

You do not specify how you measure the "memory consumption" of your application but perhaps you are using task manager? To get a better view of what is going on I suggest that you examine some performance counters for your application. Both .NET heap and general process memory performance counters are available.

If you really want to understand the details of how your application uses memory you can use the Microsoft CLR profiler.

What you see may be a result of the .NET heap expanding to accomodate a very large file. Big objects are placed on the Large Object Heap (LOH) and even if the .NET memory is garbage collected the free memory is never returned to the operating system. Also, objects on the LOH are never moved around during garbage collection and this may fragment the LOH exhausting the available address space even though there is plenty of free memory.

Has anyone experienced this while using PackagePart? Or am I using it wrong?

If you want to control the resources used by the package you are not using it in the best way. Packages are disposable and in general you should use it like this:

using (var package = Package.Open(@"c:\test\1000pages.xps", FileMode.Open, FileAccess.ReadWrite, FileShare.None)) {
  // ... process the package
}

At the end of the using statement all resources consumed by the package should be released.

If you really want to keep the _package member of your form you should at some point call Close() (or IDisposable.Dispose()) to release the resources. Calling GC.Collect() is not recommended and will not necessarily be able to recycle the resources used by the package.

Martin Liversage
thanks for the reply! i was using TaskManager. yup will be trying CLR profiler as another option. what i'm worried about is wasting time trying to find a solution that is actually a bug in the internal PackagePart implementation code that only Microsoft can fix. i also tried replacing the PackagePart stream with a FileStream that reads the contents of a 1MB file into the array. This is done the same number of times as the code above. It's basically the same procedure but only reading from a different stream. In this instance, the memory is being collected. It didn't even reach 50MB.
bjutus
hmm. i tried calling GC.Collect() right after calling ReadPackage() but nothing happened. however, i called _package.Close() then GC.Collect() after ReadPackage() and the memory usage fell back down to about 20MB from 100MB. so maybe the Package was holding the references of the streams?
bjutus
saw your updates in your answer.thanks again. I only tried GC.Collect() to check if the memory is really not being help up by anyone. looks like it is though until package is closed. yup we do call close on the package when we don't need it anymore. the problem is that it is the main source of data in our app so it has to stay open throughout the app's lifetime. the PackagePart stream though is something that we do not need throughout. which is why we expect the memory to be freed as we leave the scope of the ReadPackage() method. maybe the package has some kind of internal caching...
bjutus
I used Reflector to decompile the code and even though I can't say I fully understand what is going on it seems that the PackagePart objects does a lot of caching.
Martin Liversage
Thanks for trying Martin. I also tried inspecting it with CLR profiler and it seems that most of the memory is being held by the PackageParts' Byte array. I'll try to ask Microsoft about this through the forums and update this thread if and when I get an answer.
bjutus