views:

1067

answers:

4

Hello,

I'm trying to serialize a very large IEnumerable<MyObject> using an XmlSerializer without keeping all the objects in memory.

The IEnumerable<MyObject> is actually lazy..

  1. I'm looking for a streaming solution that will:

  2. Take an object from the IEnumerable<MyObject> Serialize it to the underlying stream using the standard serialization (I don't want to handcraft the XML here!)

  3. Discard the in memory data and move to the next

I'm trying with this code:

using (var writer = new StreamWriter(filePath))
{
 var xmlSerializer = new XmlSerializer(typeof(MyObject));
  foreach (var myObject in myObjectsIEnumerable)
  {
   xmlSerializer.Serialize(writer, myObject);
  }
}

but I'm getting multiple XML headers and I cannot specify a root tag <MyObjects> so my XML is invalid.

Any idea?

Thanks

+1  A: 

Here's what I use:

using System;
using System.Collections.Generic;
using System.Xml;
using System.Xml.Serialization;
using System.Text;
using System.IO;

namespace Utils
{
    public class XMLSerializer
    {
        public static Byte[] StringToUTF8ByteArray(String xmlString)
        {
            return new UTF8Encoding().GetBytes(xmlString);
        }

        public static String SerializeToXML<T>(T objectToSerialize)
        {
            StringBuilder sb = new StringBuilder();

            XmlWriterSettings settings = 
                new XmlWriterSettings {Encoding = Encoding.UTF8, Indent = true};

            using (XmlWriter xmlWriter = XmlWriter.Create(sb, settings))
            {
                if (xmlWriter != null)
                {
                    new XmlSerializer(typeof(T)).Serialize(xmlWriter, objectToSerialize);
                }
            }

            return sb.ToString();
        }

        public static void DeserializeFromXML<T>(string xmlString, out T deserializedObject) where T : class
        {
            XmlSerializer xs = new XmlSerializer(typeof (T));

            using (MemoryStream memoryStream = new MemoryStream(StringToUTF8ByteArray(xmlString)))
            {
                deserializedObject = xs.Deserialize(memoryStream) as T;
            }
        }
    }
}

Then just call:

string xml = Utils.SerializeToXML(myObjectsIEnumerable);

I haven't tried it with, for example, an IEnumerable that fetches objects one at a time remotely, or any other weird use cases, but it works perfectly for List<T>s and other collections that are in memory.

EDIT: Based on your comments in response to this, you could use XmlDocument.LoadXml to load the resulting XML string into an XmlDocument, save the first one to a file, and use that as your master XML file. For each item in the IEnumerable, use LoadXml again to create a new in-memory XmlDocument, grab the nodes you want, append them to the master document, and save it again, getting rid of the new one. After you're finished, there may be a way to wrap all of the nodes in your root tag. You could also use XSL and XslCompiledTransform to write another XML file with the objects properly wrapped in the root tag.

Chris Doggett
The problem here is that I don't want to keep all the objects or the whole XML doc / string in memory. I really want to serialize one object at a time and append to a FileStream the XML.
Luca Martinetti
A: 

You can do this by implementing the IXmlSerializable interface on the large class. The implementation of the WriteXml method can write the start tag, then simply loop over the IEnumerable and serialize each MyObject to the same XmlWriter, one at a time.

In this implementation, there won't be any in-memory data to get rid of (past what the garbage collector will collect).

John Saunders
+1  A: 

The XmlTextWriter class is a fast streaming API for XML generation. It is rather low-level, MSDN has an article on instantiating a validating XmlTextWriter using XmlWriter.Create().

Dour High Arch