tags:

views:

196

answers:

4

I have noticed that the unzip facility in Java is extremely slow compared to using a native tool such as WinZip.

Is there a third party library available for Java that is more efficient? Open Source is preferred.

Edit

Here is a speed comparison using the Java built-in solution vs 7zip. I added buffered input/output streams in my original solution (thanks Jim, this did make a big difference).

Zip File size: 800K Java Solution: 2.7 seconds 7Zip solution: 204 ms

Here is the modified code using the built-in Java decompression:

/** Unpacks the give zip file using the built in Java facilities for unzip. */
@SuppressWarnings("unchecked")
public final static void unpack(File zipFile, File rootDir) throws IOException
{
  ZipFile zip = new ZipFile(zipFile);
  Enumeration<ZipEntry> entries = (Enumeration<ZipEntry>) zip.entries();
  while(entries.hasMoreElements()) {
    ZipEntry entry = entries.nextElement();
    java.io.File f = new java.io.File(rootDir, entry.getName());
    if (entry.isDirectory()) { // if its a directory, create it
      continue;
    }

    if (!f.exists()) {
      f.getParentFile().mkdirs();
      f.createNewFile();
    }

    BufferedInputStream bis = new BufferedInputStream(zip.getInputStream(entry)); // get the input stream
    BufferedOutputStream bos = new BufferedOutputStream(new java.io.FileOutputStream(f));
    while (bis.available() > 0) {  // write contents of 'is' to 'fos'
      bos.write(bis.read());
    }
    bos.close();
    bis.close();
  }
}
+2  A: 

Make sure you are feeding the unzip method a BufferedInputStream in your Java application. If you have made the mistake of using an unbuffered input stream your IO performance is guaranteed to suck.

Jim Tough
A: 

I have found an 'inelegant' solution. There is an open source utility 7zip (www.7-zip.org) that is free to use. You can download the command line version (http://www.7-zip.org/download.html). 7-zip is only supported on Windows, but it looks like this has been ported to other platforms (p7zip).

Obviously this solution is not ideal since it is platform specific and relies on an executable. However, the speed compared to doing the unzip in Java is incredible.

Here is the code for the utility function that I created to interface with this utility. There is room for improvement as the code below is Windows specific.

/** Unpacks the zipfile to the output directory.  Note: this code relies on 7-zip 
   (specifically the cmd line version, 7za.exe).  The exeDir specifies the location of the 7za.exe utility. */
public static void unpack(File zipFile, File outputDir, File exeDir) throws IOException, InterruptedException
{
  if (!zipFile.exists()) throw new FileNotFoundException(zipFile.getAbsolutePath());
  if (!exeDir.exists()) throw new FileNotFoundException(exeDir.getAbsolutePath());
  if (!outputDir.exists()) outputDir.mkdirs();

  String cmd = exeDir.getAbsolutePath() + "/7za.exe -y e " + zipFile.getAbsolutePath();

  ProcessBuilder builder = new ProcessBuilder(new String[] { "cmd.exe", "/C", cmd });
  builder.directory(outputDir);
  Process p = builder.start();
  int rc = p.waitFor();
  if (rc != 0) {
    log.severe("Util::unpack() 7za process did not complete normally.  rc: " + rc);
  }
}      
Tony
A: 

This 'answer' was in fact a clarification, and has been edited into the question in the meantime.

Tony
You should have included this in your question (using "edit") instead of writing it as answer (which it isn't). Answers are often sorted by votes, which places this clarification on the bottom of the page, where it will likely be missed.
meriton
+1  A: 

The problem is not the unzipping, it's the inefficient way you write the unzipped data back to disk. My benchmarks show that using

    InputStream is = zip.getInputStream(entry); // get the input stream
    OutputStream os = new java.io.FileOutputStream(f);
    byte[] buf = new byte[4096];
    int r;
    while ((r = is.read(buf)) != -1) {
      os.write(buf, 0, r);
    }
    os.close();
    is.close();

instead reduces the method's execution time by a factor of 5 (from 5 to 1 second for a 6 MB zip file).

The likely culprit is your use of bis.available(). Aside from being incorrect (available returns the number of bytes until a call to read would block, not until the end of the stream), this bypasses the buffering provided by BufferedInputStream, requiring a native system call for every byte copied into the output file.

Note that wrapping in a BufferedStream is not necessary if you use the bulk read and write methods as I do above, and that the code to close the resources is not exception safe (if reading or writing fails for any reason, neither is nor os would be closed). Finally, if you have IOUtils in the class path, I recommend using their well tested IOUtils.copy instead of rolling your own.

meriton
Thanks Meriton!I tried this and the performance is now comparable to 7zip. I have added IOUtils to my toolbox for future. This is a very good suggestion.
Tony