tags:

views:

148

answers:

5

Hi All, I am having a requirement wherein i have to create a zip file from a list of available files. The files are of different types like txt,pdf,xml etc.I am using java util classes to do it.

The requirement here is to maintain a maximum file size of 5 mb. I should select the files from list based on timestamp, add the files to zip until the zip file size reaches 5 mb. I should skip the remaining files.

Please let me know if there is a way in java where in i can estimate the zip file size in advance without creating actual file?

Or is there any other approach to handle this

A: 

I dont think there is any way to estimate the size of zip that will be created because the zips are processed as streams. Also it would not be technically possible to predict the size of the created compressed format unless you actually compress it.

Gopi
+4  A: 

Wrap your ZipOutputStream into a personalized OutputStream, named here YourOutputStream.

  • The constructor of YourOutputStream will create another ZipOutputStream (zos2) which wraps a new ByteArrayOutputStream (baos)
    public YourOutputStream(ZipOutputStream zos, int maxSizeInBytes)
  • When you want to write a file with YourOutputStream, it will first write it on zos2
    public void writeFile(File file) throws ZipFileFullException
    public void writeFile(String path) throws ZipFileFullException
    etc...
  • if baos.size() is under maxSizeInBytes
    • Write the file in zos1
  • else
    • close zos1, baos, zos2 an throw an exception. For the exception, I can't think of an already existant one, if there is, use it, else create your own IOException ZipFileFullException.

You need two ZipOutputStream, one to be written on your drive, one to check if your contents is over 5MB.

EDIT : In fact I checked, you can't remove a ZipEntry easily.

http://download.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#size()

Colin Hebert
Thanks all for your help. Since I need only rough size and am able to find out compression ratio for most of the file types we use, I used the one suggested by Nate.Thanks all once again
Vignesh
A: 

I did this once on a project with known input types. We knew that general speaking our data compressed around 5:1 (it was all text.) So, I'd check the file size and divide by 5...

In this case, the purpose for doing so was to check that files would likely be below a certain size. We only needed a rough estimate.

All that said, I have noticed zip applications like 7zip will create a zip file of a certain size (like a CD) and then split the zip off to a new file once it reaches the limit. You could look at that source code. I have actually used the command line version of that app in code before. They have a library you can use as well. Not sure how well that will integrate with Java though.

For what it is worth, I've also used a library called SharpZipLib. It was very good. I wonder if there is a Java port to it.

Nate
+1  A: 

+1 for Colin Herbert: Add files one by one, either back up the previous step or removing the last file if the archive is to big. I just want to add some details:

Prediction is way to unreliable. E.g. a PDF can contain uncompressed text, and compress down to 30% of the original, or it contains already-compressed text and images, compressing to 80%. You would need to inspect the entire PDF for compressibility, basically having to compress them.

You could try a statistical prediction, but that could reduce the number of failed attempts, but you would still have to implement above recommendation. Go with the simpler implementation first, and see if it's enough.

Alternatively, compress files individually, then pick the files that won't exceedd 5 MB if bound together. If unpacking is automated, too, you could bind the zip files into a single uncompressed zip file.

peterchen
If fact this won't really work, you could have a file over 5MB containing only "aaaa..." it would be compressed enough to fit in the zip.
Colin Hebert
d'oh. May I claim early-morning-stupidity?
peterchen
(fixed, of course)
peterchen
+1  A: 

Maybe you could add a file each time, until you reach the 5MB limit, and then discard the last file. Like @Gopi, I don't think there is any way to estimate it without actually compressing the file.

Of course, file size will not increase (or maybe a little, because of the zip header?), so at least you have a "worst case" estimation.

AJPerez
See "Maximum expansion factor" at http://zlib.net/zlib_tech.html
snemarch