views:

307

answers:

2

I know, I know, who would want to compress or uncompressed large files in java. Completely unreasonable. For the moment suspend disbelief and assume I have a good reason for uncompressing a large zip File.

Issue 1: ZipFile has a bug (bug # 6280693), sun has fixed this in java 1.6 (Mustang). The fix isn't isn't helpful as our software needs to support java 1.4. The bug, as I understand it, works like this. When the following code is run, Java allocates a chunk of memory large enough to hold the entire file.

ZipFile zipFile = new ZipFile("/tmp/myFile.zip");

If /tmp/myFile.zip is 4gb, java allocates 4gb. This causes an out of a heap exception. A heap size of +4gb is unfortunately not an acceptable solution. =(

Solution to issue 1: Use ZipInputStream, to deal with the file as a stream and thus reduce and control the memory footprint.

byte[] buf = new byte[1024];
FileInputStream fs = new FileInputStream("/tmp/myFile.zip")
ZipInputStream zipIn = new ZipInputStream(fs);

ZipEntry ze = zipIn.getNextEntry();

while (ze != null){
  while ((int cr = zipIn.read(buf, 0, 1024)) > -1) 
    System.out.write(buf, 0, len);
  ze = zipIn.getNextEntry();
}

Issue 2: I would like access the ZipEntries randomly. That is, I would like to only uncompress one ZipEntry, without having to search through the entire stream. Currently I am building up a list of zipEntries, called zes:

        ZipInputStream zin = new ZipInputStream("/tmp/myFile.zip");

        ZipEntry ze = zin.getNextEntry();
        List<ZipEntry> zes = new ArrayList<ZipEntry>();

        while(ze!=null){
         zes.add(ze);
         ze = zin.getNextEntry();
        }

Then when I need to uncompress a particular zipEntry I iteratorate through all the zipEntries until I find the matching zipEntry, which I then uncompress.

        ZipEntry ze = in.getNextEntry();
        while (! ze.getName().equals(queryZe.getName())){
            ze = zin.getNextEntry();
        }

        int cr;

        while ((cr = zin.read(buf)) > -1) 
            System.out.write(buf, 0, cr);

Quertion: ZipFile has the ability to randomly access ZipEntries.

new BufferedInputStream(zipFile.getInputStream(zipEntry));

How can I get this same ability without using ZipFile?

Note that ZipInputStream has some rather strange behavior.

Especially good documentation on java and ZipFiles can be found here:

http://commons.apache.org/compress/zip.html

Notes on switching replacing sun ZipFile with apache commons ZipFile as suggested in the answers:

  1. Sun's ZipFile.entries() always returns the ZipEntries in the order in which they occur in the file, whereas apache commons ZipFile.getEntries() returns the entries in random order. This caused an interesting bug, because some code was assuming that the entries would be "in order".
+2  A: 

You could look at Apache Commons Compress, which works with 1.4+, but I don't know if it exposes the same bug under the hood or not.

Yishai
It doesn't? http://commons.apache.org/compress/apidocs/org/apache/commons/compress/archivers/zip/ZipFile.html
toluju
Then what is this? http://commons.apache.org/compress/apidocs/org/apache/commons/compress/archivers/zip/ZipFile.html
jsight
Oh word! My mistake! Thanks! =)
e5
+3  A: 

For this task, you may want to look at Apache Commons Compress, Apache Commons VFS, or TrueZip. All of these should be Java 1.4 compatible, and probably support the features you need.

toluju
Your answer worked, thank you!
e5