I have several fairly large XML files that represent data exported from a system that is to be used by a 3rd party vendor. I was chopping the results at 2,500 records for each XML file because the files become huge and unmanagable otherwise. However, the 3rd party vendor has asked me to combine all of these XML files into a single file. There are 78 of these XML files and they total over 700MB in size! Crazy, I know... so how would you go about combining these files to accomodate the vendor using C#? Hopefully there is a real efficient way to do this without reading in all of the files at once using LINQ :-)
views:
241answers:
2
+5
Q:
What is the most efficient way in C# to merge more than 2 xml files with the same schema together?
+4
A:
I'm going to go out on a limb here and assume that your xml looks something like:
<records>
<record>
<dataPoint1/>
<dataPoint2/>
</record>
</records>
If that's the case, I would open a file stream and write the <records>
part, then sequentially open each XML file and write all lines (except the first and last) to disk. That way you don't have huge strings in memory and it should all be very, very quick to code and run.
public void ConsolidateFiles(List<String> files, string outputFile)
{
var output = new StreamWriter(File.Open(outputFile, FileMode.Create));
output.WriteLine("<records>");
foreach (var file in files)
{
var input = new StreamReader(File.Open(file, FileMode.Open));
string line;
while (!input.EndOfStream)
{
line = input.ReadLine();
if (!line.Contains("<records>") &&
!line.Contains("</records>"))
{
output.Write(line);
}
}
}
output.WriteLine("</records>");
}
JustLoren
2009-09-10 14:41:16
This will be the fastest way, but it is a little 'hacky'.
csharptest.net
2009-09-10 14:48:29
Agreed, 'hacky' at best :p The DataSet.Merge seems far more elegant, but I have no idea how memory-efficient that would be.
JustLoren
2009-09-10 15:05:50
I have the schema baked into each XML file, so this would get even more 'hacky'
Rob Packwood
2009-09-10 15:18:20
@Rob Packwood: My example used a match on a keyword (<records>), but you could simply read in the first X # of lines, assuming that your data always begins at a certain line #. ...Definitely hacky.
JustLoren
2009-09-10 15:28:13
+2
A:
Use DataSet.ReadXml()
, DataSet.Merge()
, and DataSet.WriteXml()
. Let the framework do the work for you.
Something like this:
public void Merge(List<string> xmlFiles, string outputFileName)
{
DataSet complete = new DataSet();
foreach (string xmlFile in xmlFiles)
{
XmlTextReader reader = new XmlTextReader(xmlFile);
DataSet current = new DataSet();
current.ReadXml(reader);
complete.Merge(current);
}
complete.WriteXml(outputFileName);
}
For further description and examples, take a look at this article from Microsoft.
Donut
2009-09-10 14:41:18
This was the original route I took. The problem was that the process ended up using over a gig of RAM!
Rob Packwood
2009-09-10 15:17:30