views:

280

answers:

5

This is a C# problem. I have a big object in memory at a certain time. I want to serialize it to a file. There are two steps to do it. 1st, I need to change the object to csv string. 2nd, I need to serialize the csv string.

I have a utility tool, which can append strings to a MemoryStream. I use this utility tool to convert the big object to csv string (in a big trunk of MemoryStream). After converting the big object to a MemoryStream, I create a StreamReader of the MemoryStream and call its method StreamReader.ReadToEnd() to convert the MemoryStream to a (long) string. Then I call info.AddValue("BigObject", string); to serialize the string.

As one can see, in the memory, I will actually hold three copies of the big object. The first one is the object itself, the second will be the MemoryStream, holding the csv string and the third is the string, which is actually a redundant of the MemoryStream.

Is there any way to reduce the memory consumption in this procedure? It seems that if not MemoryStream, I will anyway need to use a StringBuilder to hold the csv string of the big object and I will anyway need to call StringBuilder.ToString() to get the final string. Then the final string and the StringBuilder will coexist in the memory and consume the same amount of memory as currently the MemoryStream and string.

Any idea is welcomed. Thank you.

+1  A: 

If you're worried about peak memory usage, I suppose you could manually force a garbage collection after you're done with the orignal object and then again after you're done with the memory stream.

(Let me just point out that, while there are a few cases where taking control of garbage collection is necessary, it's generally a bad idea. Usually, it's better to let things get collected in due time.)

Steven Sudit
Implementing the IDisposable interface or a deconstructor isn't going to help with three copies of the object being allocated on the heap at the same time.
Ty
Actually, this article says it pretty well. If you've ever called GC.Collect(); then give it a read. http://lyontamers.com/blogs/jimlyon/archive/2008/08/29/garbage-collection-finalizers-and-dispose-what-every-c-programmer-should-know.aspx
Ty
No, but nulling out all references and invoking the GC will cause the memory to be released immediately. I'll check out the article, thanks.
Steven Sudit
A: 

You don't have to implement your own serialization. You can leave it to the .NET framework. A good starting point can be found here.

jpoh
+1  A: 

Give the following a try.

        public void SerializeToFile<T>(T target, string filename)
        {
            XmlSerializer serializer = new XmlSerializer(typeof (T));

            using (FileStream stream = new FileStream(filename, FileMode.Create, FileAccess.Write))
            {
                serializer.Serialize(stream, target);
            }
        }

Edit: Assuming you can get your object to implement ISerializable and tie your utility into the GetObjectData method.

Edit2: Missed the CSV part. Icky. Try using an XSLT on the XML after serializing it.

Link to an article about converting xml to csv via an xslt.

Ty
A: 

What kind of data are we talking about? If it's text data then you could use in memery compression and save a lot of memory that way.

Booji Boy
Want to bet it's a spreadsheet?
Steven Sudit
A: 

Rather than having the intermediate step of converting the object into a CSV string, you may want to try writing the object to the file as you serialize it. Just use a file stream in place of your MemoryStream when building the CSV. Better yet, create a SerializeToStream method on your object that takes any sort of stream as a parameter.

Jacob