views:

402

answers:

1

I am using DeflaterOutputStream to compress data as a part of a proprietary archive file format. I'm then using jcraft zlib code to decompress that data on the other end. The other end is a J2ME application, hence my reliance on third party zip decompression code and not the standard Java libraries.

My problem is that some files zip and unzip just fine, and others do not.

For the ones that do not, the compression method in the first byte of the data seems to be '5'.

From my reading up on zlib, I understand that a default value of '8' indicates the default deflate compression method. Any other value appears to be unacceptable to the decompressor.

What I'd like to know is:

  • What does '5' indicate?
  • Why does DeflaterOutputStream use different compression methods some of the time?
  • Can I stop it from doing that somehow?
  • Is there another way to generate deflated data that uses only the default compression method?
+2  A: 

It might help to hone down exactly what you're looking at.

Before the whole of your data, there's usually a two-byte ZLIB header. As far as I'm aware, the lower 4 bits of the first byte of these should ALWAYS be 8. If you initialise your Deflater in nowrap mode, then you won't get these two bytes at all (though your other library must then be expecting not to get them).

Then, before each individual block of data, there's a 3-bit block header (notice, defined as a number of bits, not a whole number of bytes). Conceivably, you could have a block starting with byte 5, which would indicate a compressed block that is the final block, or with byte 8, which would be a non-compressed, non-final block.

When you create your DeflaterOutputStream, you can pass in a Deflater or your choosing to the constructor, and on that Defalter, there are some options you can set. The level is essentially the amount of look-ahead that the compression uses when looking for repeated patterns in the data; on the offchance, you might try setting this to a non-default value and see if it makes any difference to whether your decompresser can cope.

The strategy setting (see the setStrategy() method) can be used in some special circumstances to tell the deflater to only apply huffman compression. This can occasionally be useful in cases where you have already transformed your data so that frequencies of values are near negative powers of 2 (i.e. the distribution that Huffman coding works best on). I wouldn't expect this setting to affect whether a library can read your data, but juuust on the offchance, you might just try changing this setting.

In case its helpful, I've written a little bit about configuring Deflater, including the use of huffman-only compression on transformed data. I must admit, whatever options you choose, I'd really expect your library to be able to read the data. If you're really sure your compressed data is correct (i.e. ZLIB/Inflater can re-read your file), then you might consider just using another library...!

Oh, and stating the bleeding obvious but I'll mention it anyway, if your data is fixed you can of course just stick it in the jar and it'll effectively be deflated/inflater "for free". Ironically, your J2ME device MUST be able to decode zlib-compressed data, because that's essentially the format the jar is in...

Neil Coffey
Neil, how do you know so much about Huffman encoding and frequencies etc?
Cheeso
Aha.. I was inadvertently setting 'nowrap'.Thanks.
izb
Glad it's solved! Cheeso- the thing about freqs isn't such special knowledge-- it really just falls out of standard information theory. If you take some arbitrary distribution of frequencies, the ideal encoding would generally allocate a fractional number of bits to the codewords, which is of course impossible. But the ideal code has whole numbers if the probabilities of characters are are neg powers of 2 (i.e. 1/2, 1/4, 1/8 etc).
Neil Coffey