views:

1395

answers:

4

When requesting a page with Gzip compression I am getting a lot of the following errors:

System.IO.InvalidDataException: The CRC in GZip footer does not match the CRC calculated from the decompressed data

I am using native GZipStream to decompress and am looking at addressing this. With that in mind is there a work around for addressing this or another GZip library (free?) which will handle this issue properly?

I am verifying the webResponse ContentEncoding is GZIP

Update 5/11 A simplified snippit

//Caller
public void SOSampleGet(string url) 
{
    // Initialize the WebRequest.
    webRequest = (HttpWebRequest)WebRequest.Create(url);
    webRequest.Method = WebRequestMethods.Http.Get;
    webRequest.KeepAlive = true;
    webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
    webRequest.Headers.Add("Accept-Encoding", "gzip,deflate");
    webRequest.Referer = WebUtil.GetDomain(url);

    HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();    

    using (Stream stream = GetStreamForResponse(webResponse, READTIMEOUT_CONST))
    {
        //use stream
    }
}

//Method
private static Stream GetStreamForResponse(HttpWebResponse webResponse, int readTimeOut)
{
    Stream stream;
    switch (webResponse.ContentEncoding.ToUpperInvariant())
    {
        case "GZIP":
            stream = new GZipStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;
        case "DEFLATE":
            stream = new DeflateStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;

        default:
            stream = webResponse.GetResponseStream();
            stream.ReadTimeout = readTimeOut;
            break;
        }    
    return stream;
}
A: 

The native GZipStream can read a compressed GZIP (RFC 1952) stream, but it can't handle the ZIP file format.

From http://www.geekpedia.com/tutorial190_Zipping-files-using-GZipStream.html:

The disadvantage of using the GZipStream class over a 3rd party product is that it has limited capabilities. One of the limitations is that you cannot give a name to the file that you place in the archive. When GZipStream compresses the file into a ZIP archive, it takes the sequence of bytes from that file and uses compression algorithms that create a smaller sequence of bytes. The new sequence of bytes is put into the new ZIP file. When you open the ZIP file you will open the archived file itself; most popular ZIP extractors (WinZip, WinRar, etc.) will show you the content of the ZIP as a file that has the same as the archive itself.


EDIT: The above note is incorrect. GZipStream does not produce a ZIP file. It is not a "Single file ZIP stream". It is a GZIP Stream. They are different things. There's no guarantee that tools that handle ZIP archives will handle a .gz file.


For an implementation that can read ZIP archives, as opposed to single-file ZIP streams, try #ziplib (SharpZipLib, formerly NZipLib).

Andomar
I don't believe the original poster is talking about dealing with compressed/archived files. Rather, the use case is requesting a web page while sending an Accept-Encoding: header to the server, indicating that the client supports gzip. That header allows the server to compress the content before sending it to the client, saving bandwidth. Modern web browsers can do this, and many servers are configured to respond accordingly.
Chris W. Rea
cwrea is correct
Pat
Are you checking if the server actually replies with a gzip stream, for example with wireshark? Wireshark can decode and verify the reply, even if it's gzipped.
Andomar
+1  A: 

See my comment above, but this usually is a symptom of a corrupted file. If the site is your own, replace the file you are trying to access.

Mike Curry
Not my site, it seems particular to a few sites I am requesting from however.
Pat
+1  A: 

Are you flushing and closing the stream? Try wrapping your GZipStream with a Using Statement.

Matthew Whited
Its wrapped in a Try/Catch/Finally calling Dispose() of the stream in the finally block.
Pat
Could you also include a code snipet?
Matthew Whited
+1  A: 

I found some sample code that shows the entire request/response for GZip encoded pages. It uses GZipStream.

http://www.know24.net/blog/Decompress+GZip+Deflate+HTTP+Responses.aspx

Mike L