views:

1323

answers:

5

I am currently extracting the contents of a war file and then adding some new files to the directory structure and then creating a new war file.

This is all done programatically from Java - but I am wondering if it wouldn't be more efficient to copy the war file and then just append the files - then I wouldn't have to wait so long as the war expands and then has to be compressed again.

I can't seem to find a way to do this in the documentation though or any online examples.

Anyone can give some tips or pointers?

UPDATE:

TrueZip as mentioned in one of the answers seems to be a very good java library to append to a zip file (despite other answers that say it is not possible to do this).

Anyone have experience or feedback on TrueZip or can recommend other similar libaries?

+1  A: 

See this bug report.

Using append mode on any kind of structured data like zip files or tar files is not something you can really expect to work. These file formats have an intrinsic "end of file" indication built into the data format.

If you really want to skip the intermediate step of un-waring/re-waring, you could read the war file file, get all the zip entries, then write to a new war file "appending" the new entries you wanted to add. Not perfect, but at least a more automated solution.

Michael Krauklis
I am not sure how your proposed solution differs from what I am doing already - how is this more automated?
Grouchal
I am still keen to understand your solution - you say instead or un-war then re-war I should read the file and then write to a new war - is this not the same thing?Please can you explain
Grouchal
+1  A: 

I don't know of a Java library that does what you describe. But what you described is practical. You can do it in .NET, using DotNetZip.

Michael Krauklis is correct that you cannot simply "append" data to a war file or zip file, but it is not because there is an "end of file" indication, strictly speaking, in a war file. It is because the war (zip) format includes a directory, which is normally present at the end of the file, that contains metadata for the various entries in the war file. Naively appending to a war file results in no update to the directory, and so you just have a war file with junk appended to it.

What's necessary is an intelligent class that understands the format, and can read+update a war file or zip file, including the directory as appropriate. DotNetZip does this, without uncompressing/recompressing the unchanged entries, just as you described or desired.

Cheeso
+1  A: 

As Cheeso says, there's no way of doing it. AFAIK the zip front-ends are doing exactly the same as you internally.

Anyway if you're worried about the speed of extracting/compressing everything, you may want to try the SevenZipJBindings library.

I covered this library in my blog some months ago (sorry for the auto-promotion). Just as an example, extracting a 104MB zip file using the java.util.zip took me 12 seconds, while using this library took 4 seconds.

In both links you can find examples about how to use it.

Hope it helps.

Carlos Tasada
@carlos regarding your blog post: which Java version did you use? I just tested getting size of a 148M ZIP archive with standard API (`new ZipFile(file).size()`) and latest 7Zip bindings with Java 1.6.0_17 on a amd64 Linux system (4 cores). The standard API outperformed 7Zip by far (at least for the task you present on your blog: getting number of entries). Java took an avg of 1.5ms while 7Zip needed an avg of 350ms for 100 runs (excluding warmup). So from my perspective, there is no need to throw native libraries at this kind of problem.
sfussenegger
Didn't realise that this was going to use a native library thanks for point that out - will not investigate further.
Grouchal
@Carlos: If you have some free time, can you compare extraction to Apache common compress (http://commons.apache.org/compress/)?
dma_k
@dma_k: I could do the test but the documentation says 'gzip support is provided by the java.util.zip package of the Java class library.' So I don't expect any difference
Carlos Tasada
@Carlos: I confirm that (after checking `commons-compress` sources): it utilizes available algorithms where possible. They have created their own `ZipFile` implementation, but it is based on `java.util.zip.Inflater` et al. I don't expect any tremendous speed boost as well, but comparison of extraction from .zip file might be interesing for you just for completeness.
dma_k
+2  A: 

I had a similar requirement sometime back - but it was for reading and writing zip archives (.war format should be similar). I tried doing it with the existing Java Zip streams but found the writing part cumbersome - especially when directories where involved.

I'll recommend you to try out the TrueZip (open source - apache style licensed) library that exposes any archive as a virtual file system into which you can read and write like a normal filesystem. It worked like a charm for me and greatly simplified my development.

gnlogic
This looks very good - would like to know if there are any performance issues to know about?
Grouchal
So far I've been able to use it effectively with moderately sized files (3 MB etc). Haven't run into any performance problems.
gnlogic
+4  A: 

As others mentioned, it's not possible to append content to an existing zip (or war). However, it's possible to create a new zip on the fly without temporarily writing extracted content to disk. It's hard to guess how much faster this will be, but it's the fastest you can get (at least as far as I know) with standard Java. As mentioned by Carlos Tasada, SevenZipJBindings might squeeze out you some extra seconds, but porting this approach to SevenZipJBindings will still be faster than using temporary files with the same library.

Here's some code that writes the contents of an existing zip (war.zip) and appends an extra file (answer.txt) to a new zip (append.zip). All it takes is Java 5 or later, no extra libraries needed.

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Enumeration;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;

public class Main {

    // 4MB buffer
    private static final byte[] BUFFER = new byte[4096 * 1024];

    /**
     * copy input to output stream - available in several StreamUtils or Streams classes 
     */    
    public static void copy(InputStream input, OutputStream output) throws IOException {
        int bytesRead;
        while ((bytesRead = input.read(BUFFER))!= -1) {
            output.write(BUFFER, 0, bytesRead);
        }
    }

    public static void main(String[] args) throws Exception {
        // read war.zip and write to append.zip
        ZipFile war = new ZipFile("war.zip");
        ZipOutputStream append = new ZipOutputStream(new FileOutputStream("append.zip"));

        // first, copy contents from existing war
        Enumeration<? extends ZipEntry> entries = war.entries();
        while (entries.hasMoreElements()) {
            ZipEntry e = entries.nextElement();
            System.out.println("copy: " + e.getName());
            append.putNextEntry(e);
            if (!e.isDirectory()) {
                copy(war.getInputStream(e), append);
            }
            append.closeEntry();
        }

        // now append some extra content
        ZipEntry e = new ZipEntry("answer.txt");
        System.out.println("append: " + e.getName());
        append.putNextEntry(e);
        append.write("42\n".getBytes());
        append.closeEntry();

        // close
        war.close();
        append.close();
    }
}
sfussenegger
My war file is 30Mb compressed - not sure this approach will be the best way as it will require a lot of memory - I am already caching a lot of database queries in memory and this might make the memory footprint too big.
Grouchal
@Grouchal Actually you won't ever need more memory than `BUFFER` (I've chosen 4MB, but you're free to tailor it to your needs - it shouldn't hurt to reduce it to a few KB only). The file is never stored entirely in memory.
sfussenegger
the idea is to decompress contents of the existing war into `BUFFER` and compress it into a new archive - entry after entry. After that, you end up with the same archive that's ready to take some more entries. I've chosen to write "42" into answer.txt. That's where you should place your code to append more entries.
sfussenegger
How would this approach compare to using TrueZip - mentioned by gnlogic? TrueZip seems to really append to the file
Grouchal
Sorry, I didn't know this library. After digging through the code, I still don't know what it's doing, but it's doing it pretty fast :) So yes, it seem to really append content to a file.
sfussenegger
However, as you said you want "to copy the war file and then just append the files" I assume you don't want to modify the source. Using TrueZip, you'll have to copy the file which isn't necessary with the code above. Therefore, both approaches should finally be quite similar in performance.
sfussenegger
Truezip uses the concept of treating the zip file like a virtual file system. If you wanna copy and append - I bet that should be pretty easy too.
gnlogic
If you get a _ZipException - invalid entry compressed size_ with this approach, see http://www.coderanch.com/t/275390/Streams/java/ZipException-invalid-entry-compressed-size
Adam Schmideg