views:

422

answers:

1

When using the deflate-method of java.util.zip.Deflater, a byte[] has to be supplied as the argument, how big should that byte[] be initialized to? I've read there's no guarantee the compressed data will even be smaller that the uncompressed data. Is there a certain % of the input I should go with? Currently I make it twice as big as the input

+1  A: 

After calling deflate, call finished to see if it still has more to output. eg:

byte[] buffer = new byte[BUFFER_SIZE];
while (!deflater.finished()) {
  int n = deflater.deflate(buffer);
  // deal with the n bytes in out here
}

If you just want to collect all of the bytes in-memory you can use a ByteArrayOutputStream. eg:

byte[] buffer = new byte[BUFFER_SIZE];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
while (!deflater.finished()) {
  int n = deflater.deflate(buffer);
  baos.write(out, 0, n);
}
return baos.toByteArray();
Laurence Gonsalves
And if you want to end up with one giant byte array, create a `ByteArrayOutputStream` outside the loop, then append to it each iteration through with `bos.append(out,0,n)`
Adam Batkin
Thanks for the answer. I don't quite get it though...Do I have to keep calling deflate() multiple times until the whole input had been compressed?And what should I set the BUFFER_SIZE to?Is there a tutorial or something like that somewhere that explains this?thanks
Clox
I'm guessing there was some sort of race condition, cause that's exactly what the second example snippet I posted does. :-)
Laurence Gonsalves
Yes: you keep calling deflate() multiple times until the whole input had been compressed. The code above does that. BUFFER_SIZE is really a "tuning parameter". As long as it's a positive integer, the code will work, but the performance will vary depending on what you set it to. I'd probably just set it to something like 4096 (4k) and then only tweak it if performance seems to be suffering.
Laurence Gonsalves
I don't know of a tutorial on this, but you might find the source code for GZIPOutputStream instructive. It uses Deflater internally. It happens to use a default buffer size of 512, but you can actuall choose a buffer size when you create a GZIPOutputStream. If you have the JDK sources you can look at GZIPOutputStream there. If not, you can see them on this page: http://kickjava.com/src/java/util/zip/GZIPOutputStream.java.htm
Laurence Gonsalves
Ah, I get it now. Thanks a lot =)Is there a rule of thumb of around how big the buffer_size should be? I suppose it depends on how large the input data is? My input is around... ~10-30KB
Clox
What are you doing with the compressed output? You may want to just use GZIPOutputStream or ZipOutputStream and not worry about all this.
Mike
The buffer size you want also depend on what you're going to do with it. If you're streaming to a file, a small-ish (but not too small) fixed buffer is optimal. If you want the whole thing in an array, and you're okay with there being some extra padding at the end, then you could try varying the buffer size in the hopes of having to do only one deflate call. That's complicated and bug-prone, though. I'd just stick with a fixed size buffer and only worry about performance if I see there's an issue.
Laurence Gonsalves
“The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” - Michael A. Jackson (See http://en.wikipedia.org/wiki/Optimization_%28computer_science%29 for quotes in a similar vein)
Laurence Gonsalves
HehWell, basically I was just wondering about a rule of thumb, in case I'm going to use this more in the future or something.Because in my case there's no performance issues.I'm developing an online-game, the client is written in AS3 and the server in Java. There's a lot of static data that is pre-compressed(so no optimization is needed at all), and sent to the clients, and decompressing on the client-side is real fast, especially because flash has built in zlib support written in a lower language hence faster than one written in AS3, so zlib is definitely the format of choice.
Clox