views:

590

answers:

3

I'm trying to do a dump to XML of a very large database (many gigabytes). I'm using Linq-to-SQL to get the data out of the database and Linq-to-XML to generate XML. I'm using XStreamingElement to keep memory use low. The job still allocates all available memory, however, before keeling over without having written any XML. The structure looks like this:

var foo =
    new XStreamingElement("contracts",
       <LinqtoSQL which fetches data>.Select(d =>
    new XElement("contract",
        ... generate attributes etc...
using (StreamWriter sw = new StreamWriter("contracts.xml"))
{
    using (XmlWriter xw = XmlWriter.Create(sw))
    {
        foo.WriteTo(xw);
    }
}

I've also tried saving with:

foo.Save("contracts.xml", SaveOptions.DisableFormatting);

...to no avail.

Any clues?

A: 

Sure, you only need one clue for that: don't do it. :-)

XML is not an adequate format for database dumps because it does not handle large amounts of data well.

All databases have some sort of "dump" utility to export their data in a format that can then be read into another database - that would be the way to go.

Ron

Ron Savage
I'm sure there is a valid reason friism is trying to dump to XML.
stimms
There is never a *good* reason to try and dump 1 gig of database data to what would end up as 20 gig of XML. :-)
Ron Savage
It's not a pure dump, only some columns are included and there's a bit of processing involved. The reason I'm doing this, is because I want to make the data avilable to others who don't use SQL Server (right now, it will be imported into MySQL). I acknowledge your point however, it's a good one. With compression, the overhead should not be disastrous though.
friism
Every database can export to, and import from flat data files at blazing speeds (several 100k records in a few seconds). Compressed XML still has to be un-compressed to process and will be painfully slow compared to a flat file load - how many related tables / records are you wanting to export?
Ron Savage
There are more than 100.000 rows ("contracts") and each contract may have dozens of related rows ("contract winners", etc). Is there a good, cross-platform flat-file format with good tooling? If so, I'd love to hear about it.
friism
Flat files are basically the "lowest common denominator". All databases have a way to export / import them but it's specific to the db. SQL Server has bcp.exe or sqlcmd.exe for exporting, MySQL has mysql_import for importing - and the importer has to create the table to hold it first.
Ron Savage
+1  A: 

How complex is the data? I'm not overly familiar with XStreamingElement, but I wonder if you might have more joy using XmlWriter directly? Especially for like data in a loop, it can be used pretty easily.

I would, however, have concerns over xml as the choice for this data. Is this s requirement? Or simply a convenient available format? In particular, it can be hard to parse that size of xml conveniently, as you'd have to use XmlReader (which is harder to get right than XmlWriter).

If you can use other formats, I'd advise it... a few leap to mind, but I won't babble on unless you mention that you'd be interested.

Marc Gravell
A: 

Right, "solved" the problem by chunking my data into sets of 10,000 items and writing them to separate XML files. Will ponder other data exchange format and buy a larger server.

I would still be mighty interesting if someone had figures out how to properly take advantage of XStreamingElement.

friism