views:

2977

answers:

6

I have a DataSet consisting of XML data, I can easily output this to a file:

DataSet ds = new DataSet();
DataTable dt = new DataTable();
ds.Tables.Add(dt);
ds.Load(reader, LoadOption.PreserveChanges, ds.Tables[0]);
ds.WriteXml("C:\\test.xml");

However what I want to do is compress the XML into a ZIP or other type of compressed file and then just save this file to disk while splitting the ZIP file into 1MB chunks. I do not really want to save the uncompressed file, and then zip it, then split it.

What I'm looking for specifically is:

  1. a suitable compression library that I can stream the XML to and have the zip file(s) saved to disk
  2. some sample C# code that can show me how to do this.
+7  A: 

I've managed to compress a DataSet's XML stream using .NET 2.0's gzip compression.

Here's the blog post I made a few years ago about it:

Saving DataSets Locally With Compression

... and here's the code I added to my DataSet's partial class to write the compressed file (the blog post has the reading code too):

public void WriteFile(string fileName)
{
    using (FileStream fs = new FileStream(fileName, FileMode.Create))
    {
        Stream s;
        if (Path.GetExtension(fileName) == ".cmx")
        {
            s = new GZipStream(fs, CompressionMode.Compress);
        }
        else if (Path.GetExtension(fileName) == ".cmz")
        {
            s = new DeflateStream(fs, CompressionMode.Compress);
        }
        else
        {
            s = fs;
        }
        WriteXml(s);
        s.Close();
    }
}

Note that this code uses different compression schemes based on the file's extension. That was purely so I could test one scheme against the other with my DataSet.

Matt Hamilton
+1  A: 

Hi, the framework includes a few classes for compressing streams. One of them is GZipStream. If you search for it you'll find plenty of hits. Here's one of them. I imagine chunking the output would involve some some additional work.

Cristian Libardo
+2  A: 

This works with streams or files, has a good license and source: http://www.codeplex.com/DotNetZip

Here's the code to do exactly what the original poster asked: write a DataSet into a zip that is split into 1mb chunks:

// get connection to the database
var c1= new System.Data.SqlClient.SqlConnection(connstring1);
var da = new System.Data.SqlClient.SqlDataAdapter()
{
    SelectCommand= new System.Data.SqlClient.SqlCommand(strSelect, c1)
};

DataSet ds1 = new DataSet();

// fill the dataset with the SELECT 
da.Fill(ds1, "Invoices");

// write the XML for that DataSet into a zip file (split into 1mb chunks)
using(Ionic.Zip.ZipFile zip = new Ionic.Zip.ZipFile())
{
    zip.MaxOutputSegmentSize = 1024*1024;
    zip.AddEntry(zipEntryName, (name,stream) => ds1.WriteXml(stream) );
    zip.Save(zipFileName);
}
dpp
As of September 2009, DotNetZip can read or write "spanned" ZIP files. If you want to create a split or spanned zip, you specify the size limit for each segment, before saving it. You then get N files, each with a size that you specified.
Cheeso
+1  A: 

You should use Xceed Zip. The code would look like this (not tested):

ZipArchive archive = new ZipArchive( new DiskFile( @"c:\path\file.zip" ) );

archive.SplitSize = 1024*1024;
archive.BeginUpdate();

try
{
  AbstractFile destFile = archive.GetFile( "data.xml" );

  using( Stream stream = destFile.OpenWrite( true ) )
  {
    ds.WriteXml( stream );
  }
}
finally
{
  archive.EndUpdate();
}
Martin Plante
+1  A: 

There is a not-so-well-known packaging API included with the 3.5 framework. The Assembly reference is in the GAC, called WindowsBase. The System.IO.Packaging namespace contains stuff for creating OPC files (e.g. OOXML), which are zip files containing xml and whatever else is desired. You get some extra stuff you don't need, but the ZipPackage class uses a streaming interface for adding content iteratively.

flatline
+1 System.IO.Packaging rocks!
kenny
A: 

DotNetZip does zip compression, via streams, but does not do multi-part zip files. :(

EDIT: as of September 2009, DotNetZip can do multi-part zip files.

Cheeso