ansaurus

Question

How to write (big) XML to a file in C#?

Answer 1

A:

Why not simply use a TextWriter to write the XML?

erikkallen 2009-05-16 09:10:47

Because XML is not text.

David Schmitt 2009-05-16 09:12:30

"... not *simply* text." ;-)

Cerebrus 2009-05-16 09:24:28

Not at all. See the XML Infoset stuff (http://www.w3.org/TR/xml-infoset/). Do not confuse the data with its representation.

David Schmitt 2009-05-16 09:53:31

Why not simply use a TextWriter to write the XML? Yes that is one possible approach... which was certainly valid back in the good ole days... but I really don't want to write my own XML-escaping functions, etc, if I can avoid it. One of my motos is "Don't reinvent the wheel, especially if you think a square one is required." Which means I believe in asking around... not being arrogant enough to believe that everyone-else has got it all wrong, unless/until I have PROOF to the contrary.

corlettk 2009-05-16 12:55:17

Answer 2

+6 A:

Use a XmlWriter:

[...] a writer that provides a fast, non-cached, forward-only means of generating streams or files containing XML data.

David Schmitt 2009-05-16 09:12:06

Spot on target! +1

Cerebrus 2009-05-16 09:25:13

OK I'll try both ways (for the key elements only, to save time) in a prototype. XmlWriter looks like "the right answer", but I guess it will involve a LOT more code than my initial XmlSerializer solution, and also nullifies the "flexibility" benefits of using generated binding classes... because the hand-made writing code must know all about the exact schema.I thank you for you time... Cheers. Keith.

corlettk 2009-05-16 09:45:52

Answer 3

+9 A:

For writing large xml, XmlWriter (directly) is your friend - but it is harder to use. The other option would be to use DOM/object-model approaches and combine them, which is probably doable if you seize control of the XmlWriterSettings and disable the xml marker, and get rid of the namespace declarations...

using System;
using System.Collections.Generic;
using System.Xml;
using System.Xml.Serialization;    
public class Foo {
    [XmlAttribute]
    public int Id { get; set; }
    public string Bar { get; set; }
}
static class Program {
    [STAThread]
    static void Main() {
        using (XmlWriter xw = XmlWriter.Create("out.xml")) {
            xw.WriteStartElement("xml");
            XmlSerializer ser = new XmlSerializer(typeof(Foo));
            XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
            ns.Add("","");
            foreach (Foo foo in FooGenerator()) {
                ser.Serialize(xw, foo, ns);
            }
            xw.WriteEndElement();
        }
    }    
    // streaming approach; only have the smallest amount of program
    // data in memory at once - in this case, only a single `Foo` is
    // ever in use at a time
    static IEnumerable<Foo> FooGenerator() {
        for (int i = 0; i < 40; i++) {
            yield return new Foo { Id = i, Bar = "Foo " + i };
        }
    }
}

Marc Gravell 2009-05-16 09:13:38

Marc,Merci. (As per my a comment to David Schmitt's answer) I'll try it both ways and run some performance tests.I thank you sir for your thoughtful answer, and that example code. Awesome. Ta. ;-)I only hope that I can repay the favour someday.Cheers. Keith.

corlettk 2009-05-16 09:51:05

Answer 4

+1 A:

Did you consider compressing it before writing it to disk? With XML you can reach more than 10 times compressing and even more. it will probably take you less time to compress the file and write the compressed version than to read the whole 500Mb version.

shoosh 2009-05-16 09:24:17

The bigger problem is that the in-memory DOM represenation is usually > x10 of the actual underlying xml... and 5Gb is just too big to handle sensibly. And again it doesn't help if there is an existing API/expectation of an uncompressed file.

Marc Gravell 2009-05-16 09:29:39

It's a good thought. Thank you. I may as well compress it before it hits the disk; saving some time (and memory) reading it back and sending it as a HttpWebRequest. Our experience with this XML is compresses to about one quarter of it's exploded size... saving (3/4) * 500 = 375 MB of RAM.

corlettk 2009-05-16 12:49:20

ansaurus

tags:

views:

answers:

How to write (big) XML to a file in C#?

related questions