views:

2011

answers:

8

Are System.IO.Compression.GZipStream or System.IO.Compression.Deflate compatible with zlib compression?

+2  A: 

They just compressing the data using zlib or deflate algorithms , but does not provide the output for some specific file format. This means that if you store the stream as-is to the hard drive most probably you will not be able to open it using some application (gzip or winrar) because file headers (magic number, etc ) are not included in stream an you should write them yourself.

andreasmk2
A: 

I agree with andreas. You probably won't be able to open the file in an external tool, but if that tool expects a stream you might be able to use it. You would also be able to deflate the file back using the same compression class.

configurator
+1  A: 

gzip is deflate + some header/footer data, like a checksum and length, etc. So they're not compatible in the sense that one method can use a stream from the other, but they employ the same compression algorithm.

Lasse V. Karlsen
+3  A: 

From MSDN about System.IO.Compression.GZipStream:

This class represents the gzip data format, which uses an industry standard algorithm for lossless file compression and decompression.

From the zlib FAQ:

The gz* functions in zlib on the other hand use the gzip format.

So zlib and GZipStream should be interoperable, but only if you use the zlib functions for handling the gzip-format.

System.IO.Compression.Deflate and zlib are reportedly not interoperable.

If you need to handle zip files (you probably don't, but someone else might need this) you need to use SharpZipLib or another third-party library.

Rasmus Faber
zip files are not the same as zlib-compressed files (the compression algorithms may be the same, but the headers are not)
Ben Collins
You are right. I will edit my response.
Rasmus Faber
re: "reportedly not interoperable" regarding zlib and DeflateStream. They are ACTUALLY not interoperable. There are three IETF RFCs covering this space: 1950 for ZLIB, 1951 for DEFLATE, and 1952 for GZIP. Deflate is the compression algorithm. ZLIB and GZIP are distinct formats, which define metadata, aka "headers", that apply to the compressed stream. The zlib library implements both ZLIB and GZIP. To make it interesting, both ZLIB and GZIP can use DEFLATE as the compression mechanism. The DeflateStream class produces a bare, headerless stream. It's no wonder we are all confused.
Cheeso
+5  A: 

I've used GZipStream to compress the output from the .NET XmlSerializer and it has worked perfectly fine to decompress the result with gunzip (in cygwin), winzip and another GZipStream.

For reference, here's what I did in code:

FileStream fs = new FileStream(filename, FileMode.Create, FileAccess.Write);
using (GZipStream gzStream = new GZipStream(fs, CompressionMode.Compress))
{
  XmlSerializer serializer = new XmlSerializer(typeof(MyDataType));
  serializer.Serialize(gzStream, myData);
}

Then, to decompress in c#

FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read);
using (Stream input = new GZipStream(fs, CompressionMode.Decompress))
{
   XmlSerializer serializer = new XmlSerializer(typeof(MyDataType));
   myData = (MyDataType) serializer.Deserialize(input);
}

Using the 'file' utility in cygwin reveals that there is indeed a difference between the same file compressed with GZipStream and with GNU GZip (probably header information as others has stated in this thread). This difference, however, seems to not matter in practice.

Isak Savo
works like charm!The big dataset I'm using for performance testing has been compressed from 55MB to just 7.5MB, without noticeable performance loss.P.S. If the "file" is renamed to "file.gz", it becomes a perfectly valid archive file. You can even modify its content using any archive tool, and it will remain deserializeable using your method.
Soonts
+1  A: 

DotNetZip includes a DeflateStream, a ZlibStream, and a GZipStream, to handle RFC 1950, 1951, and 1952. The all use the DEFLATE Algorithm but the framing and header bytes are different for each one.

As an advantage, the streams in DotNetZip do not exhibit the anomaly of expanding data size under compression, reported against the built-in streams. Also, there is no built-in ZlibStream, whereas DotNetZip gives you that, for good interop with zlib.

Cheeso
A: 

Can a gzip stream be stored in an NVarchar Sql Server field without loss of information?

John
+1  A: 

I ran into this issue with Git objects. In that particular case, they store the objects as deflated blobs with a Zlib header, which is documented in RFC 1950. You can make a compatible blob by making a file that contains:

  • Two header bytes (CMF and FLG from RFC 1950) with the values 0x78 0x01
    • CM = 8 = deflate
    • CINFO = 7 = 32Kb window
    • FCHECK = 1 = checksum bits for this header
  • The output of the C# DeflateStream
  • An Adler32 checksum of the input data to the DeflateStream, big-endian format (MSB first)

I made my own Adler implementation

public class Adler32Computer
{
    private int a = 1;
    private int b = 0;

    public int Checksum
    {
        get
        {
            return ((b * 65536) + a);
        }
    }

    private static readonly int Modulus = 65521;

    public void Update(byte[] data, int offset, int length)
    {
        for (int counter = 0; counter < length; ++counter)
        {
            a = (a + (data[offset + counter])) % Modulus;
            b = (b + a) % Modulus;
        }
    }
}

And that was pretty much it.

Blake Ramsdell