views:

550

answers:

2

I am using MiniLZO on a project for some really simple compression tasks. I am compressing with one program, and decompressing with another. I'd like to know how much space to allocate for the decompression buffer. I am fine with over-allocating space, if it can save me the trouble of having to annotate my output file with an integer declaring how much space the decompressed data should take. How would I figure out how much space it could possibly take?

After some consideration, I think this question boils down to the following: What is the maximum compression ratio of lzo1x compression?

+2  A: 

The max size of the decompressed data is clearly the same as the max size of the data you compressed in the first place.

If there is an upper bound on your input size then I guess you can use it, but I have to say the usual way of doing this is to add a header to your compressed buffer which specifies the uncompressed size.

snowcrash09
Thanks, I'll keep that in mind. Sadly, I don't have control over the max size either. It looks like I'll probably have to add a header if I want to do this safely.
Benson
+2  A: 

Since you control both the compressor and the decompressor, I suggest you compress the input in fixed-sized blocks. In my application I compress up to 64KB in each block, then emit the size of the compressed block and the compressed data itself, so the compressed stream actually looks like a series of compressed blocks:

length_of_block_1
block_1
length_of_block_2
block_2
...

The decompressor just reads each compressed block and decompresses it into a 64KB buffer, since I know the block was produced by compressing a 64KB block.

Hope that helps,

Eric Melski

Eric Melski
This is also a good suggestion, but it's adding annotations -- exactly what I was hoping to avoid. As such, I may as well just compress my data in one block (as that's how it's stored already) and annotate it with the block size.
Benson
I thought you just wanted to avoid storing the *decompressed* size in the output. I do not think you can avoid storing some kind of end-of-block marker, unless you ensure the input blocks are always less than N bytes; then each compressed stream has exactly one block, so you need no delimiter. Or you could extend the decompressor to return a partial result and "more to do" code when it fills the buffer, so you could call it repeatedly to decompress the entire input.
Eric Melski
All good suggestions, but I think that storing the size of the decompressed buffer would be simpler than that. So, I guess the answer to my question is "get over it and annotate with the decompressed size".
Benson