tags:

views:

1045

answers:

3

Hello, I use this code to create a .zip with a list of files:

ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(zipFile));

for (int i=0;i<srcFiles.length;i++){
    String fileName=srcFiles[i].getName();
    ZipEntry zipEntry = new ZipEntry(fileName);
    zos.putNextEntry(zipEntry);
    InputStream fis = new FileInputStream(srcFiles[i]);
    int read;
    for(byte[] buffer=new byte[1024];(read=fis.read(buffer))>0;){
        zos.write(buffer,0,read);
    }
    fis.close();
    zos.closeEntry();
}
zos.close();

I don't know how the zip algorithm and the ZipOutputStream works, if it writes something before I read and send to 'zos' all of the data, the result file can be different in size of bytes than if I choose another buffer size.

in other words I don't know if the algorithm is like:

READ DATA-->PROCESS DATA-->CREATE .ZIP

or

READ CHUNK OF DATA-->PROCESS CHUNK OF DATA-->WRITE CHUNK IN .ZIP-->| ^-----------------------------------------------------------------------------------------------------------------------------

If this is the case, what buffer size is the best?

Update:

I have tested this code, changing the buffer size from 1024 to 64, and zipping the same files: with 1024 bytes the 80 KB result file was 3 bytes smaller than with 64 bytes buffer. Which is the best buffer size to produce the smallest .zip in the fatest time?

A: 

Depends on the hardware you have (disk speed and file search time). I would say if you are not interested in squeezing the last drop of performance pick any size between 4k and 64k. Since it is a short-lived object it will be collected quickly anyway.

ddimitrov
+2  A: 

Short answer: I would pick something like 16k.


Long answer:

ZIP is using the DEFLATE algorithm for compression (http://en.wikipedia.org/wiki/DEFLATE). Deflate is a flavor of Ziv Lempel Welch(search wikipedia for LZW). DEFLATE uses LZ77 and Huffman coding.

This is a dictionary compression, and as far as I know from the algorithm standpoint the buffer size used when feeding the data into the deflater should have almost no impact. The biggest impact for LZ77 are dictionary size and sliding window, which are not controlled by the buffer size in your example.

I think you can experiment with different buffer sizes if you want and plot a graph, but I am sure you will not see any significant changes in compression ratio (3/80000 = 0.00375%).

The biggest impact the buffer size has is on the speed due to the amount of overhead code that is executed when you make the calls to FileInputStream.read and zos.write. From this point of view you should take into account what you gain and what you spend.

When increasing from 1 byte to 1024 bytes, you lose 1023 bytes (in theory) and you gain a ~1024 reduction of the overhead time in the .read and .write methods. However when increasing from 1k to 64k, you are spending 63k which reducing the overhead 64 times.

So this comes with diminishing returns, thus I would choose somewhere in the middle (let's say 16k) and stick with that.

Dan Cristoloveanu
I accept this answer because it shows that the buffer size don't affect significatively the result size but the dictionary size and sliding window
Telcontar
A: 

I try do download a zip archive. Original size is 88k, downloaded size is 4k. Anyone can help?

Dr.Vet. Cumpanasu Florin