tags:

views:

688

answers:

4

Argh, today is the day of stupid problems and me being an idiot.

I have an application which creates a zip file containing some JPEGs from a certain directory. I use this code in order to:

  • read all files from the directory
  • append each of them to a ZIP file

using (var outStream = new FileStream("Out2.zip", FileMode.Create))
{
    using (var zipStream = new ZipOutputStream(outStream))
    {
        foreach (string pathname in pathnames)
        {
            byte[] buffer = File.ReadAllBytes(pathname);

            ZipEntry entry = new ZipEntry(Path.GetFileName(pathname));
            entry.DateTime = now;

            zipStream.PutNextEntry(entry);
            zipStream.Write(buffer, 0, buffer.Length);
        }
    }
}

All works well under Windows, when I open the file e. g. with WinRAR, the files are extracted. But as soon as I try to unzip my archive on Mac OS X, it only creates a .cpgz file. Pretty useless.

A normal .zip file created manually with the same files on Windows is extracted without any problems on Windows and Mac OS X.

I found the above code on the Internet, so I am not absolutely sure if the whole thing is correct. I wonder if it is needed to use zipStream.Write() in order to write directly to the stream?

+10  A: 

I don't know for sure, because I am not very familiar with either SharpZipLib or OSX , but I still might have some useful insight for you.

I've spent some time wading through the zip spec, and actually I wrote DotNetZip, which is a zip library for .NET, unrelated to SharpZipLib.

Currently on the user forums for DotNetZip, there's a discussion going on about zip files generated by DotNetZip that cannot be read on OSX. One of the people using the library is having a problem that seems similar to what you are seeing. Except I have no idea what a .cpgxz file is.

We tracked it down, a little. At this point the most promising theory is that OSX does not like "bit 3" in the "general purpose bitfield" in the header of each zip entry.

Bit 3 is not new. PKWare added bit 3 to the spec 17 years ago. It was intended to support streaming generation of archives, in the way that SharpZipLib works. DotNetZip also has a way to produce a zipfile as it is streamed out, and it will also set bit-3 in the zip file if used in this way, although normally DotNetZip will produce a zipfile with bit-3 unset in it.

From what we can tell, when bit 3 is set, the OSX zip reader (whatever it is - like I said I'm not familiar with OSX) chokes on the zip file. The same zip contents produced without bit 3, allows the zip file to be opened. Actually it's not as simple as just flipping one bit - the presence of the bit signals the presence of other metadata. So I am using "bit 3" as a shorthand for all that.

So the theory is that bit 3 causes the problem. I haven't tested this myself. There's been some impedance mismatch on the communication with the person who has the OSX machine - so it is unresolved as yet.

But, if this theory holds, it would explain your situation: that WinRar and any Windows machine can open the file, but OSX cannot.

On the DotNetZip forums, we had a discussion about what to do about the problem. As near as I can tell, the OSX zip reader is broken, and cannot handle bit 3, so the workaround is to produce a zip file with bit 3 unset. I don't know if SharpZipLib can be convinced to do that.

I do know that if you use DotNetZip, and use the normal ZipFile class, and save to a seekable stream (like a filesystem file), you will get a zip that does not have bit 3 set. If the theory is correct, it should open with no problem on the Mac, every time. This is the result the DotNetZip user has reported. It's just one result so not generalizable yet, but it looks plausible.

example code for your scenario:

  using (ZipFile zip = new ZipFile()
  {
      zip.AddFiles(pathnames);
      zip.Save("Out2.zip");
  }

Just for the curious, in DotNetZip you will get bit 3 set if you use the ZipFile class and save it to a nonseekable stream (like ASPNET's Response.OutputStream) or if you use the ZipOutputStream class in DotNetZip, which always writes forward only (no seeking back). I think SharpZipLib's ZipOutputStream is also always "forward only."

Cheeso
+1 for your very thorough response. If there is no way to get it to work with SharpZipLib I will probably switch to DotNetZip.
Max
thanks for the helpful response! now, can you tell, whether DotNetZip will support zipping to non-seekable streams without the ominous bit 3 being set in the near future?
I'm not sure if it's possible to do so. The workaround is to zip out to a seekable stream (like a MemoryStream, or a FileStream) and then stream *that* to your non-seekable stream.
Cheeso
+8  A: 

So, I searched for a few more examples on how to use SharpZipLib and I finally got it to work on Windows and os x. Basically I added the "Crc32" of the file to the zip archive. No idea what this is though.

Here is the code that worked for me:

        using (var outStream = new FileStream("Out3.zip", FileMode.Create))
        {
            using (var zipStream = new ZipOutputStream(outStream))
            {
                Crc32 crc = new Crc32();

                foreach (string pathname in pathnames)
                {
                    byte[] buffer = File.ReadAllBytes(pathname);

                    ZipEntry entry = new ZipEntry(Path.GetFileName(pathname));
                    entry.DateTime = now;
                    entry.Size = buffer.Length;

                    crc.Reset();
                    crc.Update(buffer);

                    entry.Crc = crc.Value;

                    zipStream.PutNextEntry(entry);
                    zipStream.Write(buffer, 0, buffer.Length);
                }

                zipStream.Finish();

                // I dont think this is required at all
                zipStream.Flush();
                zipStream.Close();

            }
        }


Explanation from cheeso:

CRC is Cyclic Redundancy Check - it's a checksum on the entry data. Normally the header for each entry in a zip file contains a bunch of metadata, including some things that cannot be known until all the entry data has been streamed - CRC, Uncompressed size, and compressed size. When generating a zipfile through a streamed output, the zip spec allows setting a bit (bit 3) to specify that these three data fields will immediately follow the entry data.

If you use ZipOutputStream, normally as you write the entry data, it is compressed and a CRC is calculated, and the 3 data fields are written immediately after the file data.

What you've done is streamed the data twice - the first time implicitly as you calculate the CRC on the file before writing it. If my theory is correct, the what is happening is this: When you provide the CRC to the zipStream before writing the file data, this allows the CRC to appear in its normal place in the entry header, which keeps OSX happy. I'm not sure what happens to the other two quantities (compressed and uncompressed size).


Max
+1  A: 

What's going on with the .cpgz file is that Archive Utility is being launched by a file with a .zip extension. Archive Utility examines the file and thinks it isn't compressed, so it's compressing it. For some bizarre reason, .cpgz (CPIO archiving + gzip compression) is the default. You can set a different default in Archive Utility's Preferences.

If you do indeed discover this is a problem with OS X's zip decoder, please file a bug. You can also try using the ditto command-line tool to unpack it; you may get a better error message. Of course, OS X also ships unzip, the Info-ZIP utility, but I'd expect that to work.

Nicholas Riley
+2  A: 

Hi there,

got the exact same problem today. I tried implementing the CRC stuff as proposed but it didn't help.

I finaly found the solution on this page: http://community.sharpdevelop.net/forums/p/7957/23476.aspx#23476

As a result, I just had to add this line in my code:

oZIPStream.UseZip64 = UseZip64.Off;

And the file opens up as it should on MacOS X :-)

Cheers fred

Fred Mauroy