ansaurus

Question

How to extract a single file from a remote archive file?

Answer 1

+3 A:

Well, at a minimum, you have to download the portion of the archive up to and including the compressed data of the file you want to extract. That suggests the following solution: open a URLConnection to the archive, get its input stream, wrap it in a ZipInputStream, and repeatedly call getNextEntry() and closeEntry() to iterate through all the entries in the file until you reach the one you want. Then you can read its data using ZipInputStream.read(...).

The Java code would look something like this:

URL url = new URL("http://example.com/path/to/archive");
ZipInputStream zin = new ZipInputStream(url.getInputStream());
ZipEntry ze = zin.getNextEntry();
while (!ze.getName().equals(pathToFile)) {
    zin.closeEntry(); // not sure whether this is necessary
    ze = zin.getNextEntry();
}
byte[] bytes = new byte[ze.getSize()];
zin.read(bytes);

This is, of course, untested.

David Zaslavsky 2010-06-27 00:00:08

Thank you; this seems to work well (bar minor errors), though unfortunately this cannot handle anything but zip archives.

Oak 2010-06-27 07:21:45

Well yeah, why do you think it's called `ZipInputStream`? ;-) If you look around the internet you might be able to find a `TarInputStream` that you could use roughly the same way - or if not, you could write your own. It'd be easy because tar files aren't compressed, it's basically just a header for each file followed by the file data. (Wikipedia has a description of the format) For gzipped tar archives, Java's standard library has a `GZIPInputStream` you can use along with the tar stream.

David Zaslavsky 2010-06-27 19:59:56

Indeed, Apache has a [TarInputStream](http://javadoc.haefelinger.it/org.apache.ant/1.7.1/org/apache/tools/tar/TarInputStream.html) class :)

Oak 2010-06-28 07:42:16

Answer 2

A:

I'm not sure if there's a way to pull out a single file from a ZIP without downloading the whole thing first. But, if you're the one hosting the ZIP file, you could create a Java servlet which reads the ZIP file and returns the requested file in the response:

public class GetFileFromZIPServlet extends HttpServlet{
  @Override
  public void doGet(HttpServletRequest request, HttpServletResponse response)
  throws ServletException, IOException{
    String pathToFile = request.getParameter("pathToFile");

    byte fileBytes[];
    //get the bytes of the file from the ZIP

    //set the appropriate content type, maybe based on the file extension
    response.setContentType("...");

    //write file to the response
    response.getOutputStream().write(fileBytes);
  }
}

Michael Angstadt 2010-06-27 22:37:36

Unfortunately, I am not the one hosting the files... but it is a good point.

Oak 2010-06-28 07:34:46

Answer 3

+2 A:

Contrary to the other answers here, I'd like to point out that ZIP entries are compressed individually, so (in theory) you don't need to download anything more than the directory and the entry itself. The server would need to support the Range HTTP header for this to work.

The standard Java API only supports reading ZIP files from local files and input streams. As far as I know there's no provision for reading from random access remote files.

Since you're using TrueZip, I recommend implementing de.schlichtherle.io.rof.ReadOnlyFile using Apache HTTP Client and creating a de.schlichtherle.util.zip.ZipFile with that.

This won't provide any advantage for compressed TAR archives since the entire archive is compressed together (beyond just using an InputStream and killing it when you have your entry).

Adam Crume 2010-06-27 23:34:06

ansaurus

tags:

views:

answers:

How to extract a single file from a remote archive file?

related questions