views:

718

answers:

6

My application does a good deal of binary serialization and compression of large objects. Uncompressed the serialized dataset is about 14 MB. Compressed it is arround 1.5 MB. I find that whenever I call the serialize method on my dataset my large object heap performance counter jumps up from under 1 MB to about 90 MB. I also know that under a relatively heavy loaded system, usually after a while of running (days) in which this serialization process happens a few time, the application has been known to throw out of memory excpetions when this serialization method is called even though there seems to be plenty of memory. I'm guessing that fragmentation is the issue (though i can't say i'm 100% sure, i'm pretty close)

The simplest short term fix (i guess i'm looking for both a short term and a long term answer) i can think of is to call GC.Collect right after i'm done the serialization process. This, in my opinion, will garbage collect the object from the LOH and will do so likely BEFORE other objects can be added to it. This will allow other objects to fit tightly tightly against the remaining objects in the heap without causing much fragmentation.

Other than this ridiculous 90MB allocation i don't think i have anything else that uses a lost of the LOH. This 90 MB allocation is also relatively rare (arround every 4 hours). We of course will still have the 1.5 MB array in there and maybe some other smaller serialized objects.

Any ideas?

Update as a result of good responses

Here is my code which does the work. I've actually tried changing this to compress WHILE serializing so that serialization serializes to a stream at the same time and i don't get much better result. I've also tried preallocating the memory stream to 100 MB and trying to use the same stream twice in a row, the LOH goes up to 180 MB anyways. I'm using Process Explorer to monitor it. It's insane. I think i'm going to try the UnmanagedMemoryStream idea next.

I would encourage you guys to try it out if you wont. It doesn't have to be this exact code. Just serialize a large dataset and you will get surprising results (mine has lots of tables, arround 15 and lots of strings and columns)

        byte[] bytes;
        System.Runtime.Serialization.Formatters.Binary.BinaryFormatter serializer =
        new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();            
        System.IO.MemoryStream memStream = new System.IO.MemoryStream();
        serializer.Serialize(memStream, obj);
        bytes = CompressionHelper.CompressBytes(memStream.ToArray());
        memStream.Dispose();
        return bytes;

Update after trying binary serialization with UnmanagedMemoryStream

Even if I serialize to an UnmanagedMemoryStream the LOH jumps up to the same size. It seems that no matter what i do, called the BinaryFormatter to serialize this large object will use the LOH. As for pre-allocating, it doesn't seem to help much. Say i pre-allocate say i preallocate 100MB, then i serialize, it will use 170 MB. Here is the code for that. Even simpler than the above code

BinaryFormatter serializer  = new BinaryFormatter();
MemoryStream memoryStream = new MemoryStream(1024*1024*100);
GC.Collect();
serializer.Serialize(memoryStream, assetDS);

The GC.Collect() in the middle there is just to update the LOH performance counter. You will see that it will allocate the correct 100 MB. But then when you call the serialize, you will notice that it seems to add that on top of the 100 that you have already allocated.

+2  A: 

90MB of RAM is not much.

Avoid calling GC.Collect unless you have a problem. If you have a problem, and no better fix, try calling GC.Collect and seeing if your problem is solved.

Brad
+3  A: 

Beware of the way collection classes and streams like MemoryStream work in .NET. They have an underlying buffer, a simple array. Whenever the collection or stream buffer grows beyond the allocated size of the array, the array gets re-allocated, now at double the previous size.

This can cause many copies of the array in the LOH. Your 14MB dataset will start using the LOH at 128KB, then take another 256KB, then another 512KB, etcetera. The last one, the one actually used, will be around 16MB. The LOH contains the sum of these, around 30MB, only one of which is in actual use.

Do this three times without a gen2 collection and your LOH has grown to 90MB.

Avoid this by pre-allocating the buffer to the expected size. MemoryStream has a constructor that takes an initial capacity. So do all collection classes. Calling GC.Collect() after you've nulled all references can help unclog the LOH and purge those intermediate buffers, at the cost of clogging the gen1 and gen2 heaps too soon.

Hans Passant
A: 

If you really need to use the LOH for something like a service or something that needs to be running for a long time, you need to use buffer pools that are never deallocated and that you can ideally allocate on start-up. This means you'll have to do your 'memory management' yourself for this, of course.

Depending on what you're doing with this memory, you might also have to p/Invoke over to native code for selected parts to avoid having to call some .NET API that forces you to put the data on newly allocated space in the LOH.

This is a good starting point article about the issues: http://blogs.msdn.com/maoni/archive/2004/12/19/327149.aspx

I'd consider you very lucky if you GC trick would work, and it would really only work if there isn't much going on at the same time in the system. If you have work going on in parallel, this will just slightly delay the unevitable.

Also read up on the documentation about GC.Collect.IIRC, GC.Collect(n) only says that it collects no further than the generation n -- not that it actually ever GETS to generation n.

Rovpedal
A: 

Don't worry about LOH size jumping up. Worry about allocating/deallocating LOH. .Net very dumb about LOH -- rather than allocating LOH objects far away from regular heap, it allocates at next available VM page. I have a 3D app that does much allocate/deallocate of both LOH and regular objects -- the result (as seen in DebugDiag dump report) is that pages of small heap and large heap end up alternating throughout RAM, until there are no large chunks of the applications 2 GB VM space left. The solution when possible is to allocate once what you need, and then don't release it -- re-use it next time.

Use DebugDiag to analyze your process. See how the VM addresses gradually creep up towards 2 GB address mark. Then make a change that keeps that from happening.

ToolmakerSteve
A: 

I agree with some of the other posters here that you might want to try and use tricks to work with the .NET Framework instead of trying to force it to work with you via GC.Collect.

You may find this Channel 9 video helpful which discusses ways to ease pressure on the Garbage collector.

David Silva Smith
A: 

Unfortunately, the only way I could fix this was to break up the data in chunks so as not to allocate large chunks on the LOH. All the proposed answers here were good and were expected to work but they did not. It seems that the binary serialization in .NET (using .NET 2.0 SP2) does its own little magic under the hood which prevents users from having control over memory allocation.

Answer then to the question would be "this is not likely to work". When it comes to using .NET serialization, your best bet is to serialize the large objects in smaller chunks. For all other scenarios, the answers mentioned above are great.

Mark