When using the deflate-method of java.util.zip.Deflater, a byte[] has to be supplied as the argument, how big should that byte[] be initialized to? I've read there's no guarantee the compressed data will even be smaller that the uncompressed data. Is there a certain % of the input I should go with? Currently I make it twice as big as the input
+1
A:
After calling deflate
, call finished
to see if it still has more to output. eg:
byte[] buffer = new byte[BUFFER_SIZE];
while (!deflater.finished()) {
int n = deflater.deflate(buffer);
// deal with the n bytes in out here
}
If you just want to collect all of the bytes in-memory you can use a ByteArrayOutputStream. eg:
byte[] buffer = new byte[BUFFER_SIZE];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
while (!deflater.finished()) {
int n = deflater.deflate(buffer);
baos.write(out, 0, n);
}
return baos.toByteArray();
Laurence Gonsalves
2009-07-30 16:58:09
And if you want to end up with one giant byte array, create a `ByteArrayOutputStream` outside the loop, then append to it each iteration through with `bos.append(out,0,n)`
Adam Batkin
2009-07-30 17:03:17
Thanks for the answer. I don't quite get it though...Do I have to keep calling deflate() multiple times until the whole input had been compressed?And what should I set the BUFFER_SIZE to?Is there a tutorial or something like that somewhere that explains this?thanks
Clox
2009-07-30 17:06:20
I'm guessing there was some sort of race condition, cause that's exactly what the second example snippet I posted does. :-)
Laurence Gonsalves
2009-07-30 17:07:36
Yes: you keep calling deflate() multiple times until the whole input had been compressed. The code above does that. BUFFER_SIZE is really a "tuning parameter". As long as it's a positive integer, the code will work, but the performance will vary depending on what you set it to. I'd probably just set it to something like 4096 (4k) and then only tweak it if performance seems to be suffering.
Laurence Gonsalves
2009-07-30 17:11:52
I don't know of a tutorial on this, but you might find the source code for GZIPOutputStream instructive. It uses Deflater internally. It happens to use a default buffer size of 512, but you can actuall choose a buffer size when you create a GZIPOutputStream. If you have the JDK sources you can look at GZIPOutputStream there. If not, you can see them on this page: http://kickjava.com/src/java/util/zip/GZIPOutputStream.java.htm
Laurence Gonsalves
2009-07-30 17:17:41
Ah, I get it now. Thanks a lot =)Is there a rule of thumb of around how big the buffer_size should be? I suppose it depends on how large the input data is? My input is around... ~10-30KB
Clox
2009-07-30 17:19:06
What are you doing with the compressed output? You may want to just use GZIPOutputStream or ZipOutputStream and not worry about all this.
Mike
2009-07-30 17:27:49
The buffer size you want also depend on what you're going to do with it. If you're streaming to a file, a small-ish (but not too small) fixed buffer is optimal. If you want the whole thing in an array, and you're okay with there being some extra padding at the end, then you could try varying the buffer size in the hopes of having to do only one deflate call. That's complicated and bug-prone, though. I'd just stick with a fixed size buffer and only worry about performance if I see there's an issue.
Laurence Gonsalves
2009-07-30 17:42:29
“The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” - Michael A. Jackson (See http://en.wikipedia.org/wiki/Optimization_%28computer_science%29 for quotes in a similar vein)
Laurence Gonsalves
2009-07-30 17:43:17
HehWell, basically I was just wondering about a rule of thumb, in case I'm going to use this more in the future or something.Because in my case there's no performance issues.I'm developing an online-game, the client is written in AS3 and the server in Java. There's a lot of static data that is pre-compressed(so no optimization is needed at all), and sent to the clients, and decompressing on the client-side is real fast, especially because flash has built in zlib support written in a lower language hence faster than one written in AS3, so zlib is definitely the format of choice.
Clox
2009-07-30 18:03:02