Hello all, I'm writing a compression library as a little side project, and I'm far enough along (My library can extract any standard gzip file, as well as produce compliant (but certainly not yet optimal) gzip output) that it's time to figure out a meaningful block termination strategy. Currently, I just cut the blocks off after every 32k of input (LZ77 window size) because it was conveinent and quick to implement -- now I am going back and trying to actually improve compression efficiency.
The Deflate spec has only this to say about it: "The compressor terminates a block when it determines that starting a new block with fresh trees would be useful, or when the block size fills up the compressor's block buffer", which isn't all that helpful.
I sorted through the SharpZipLib code (as I figured it would be the mosteasily readable open source implementation), and found that it terminates a block every 16k literals of output, ignoring the input. This is easy enough to implement, but it seems like there must be some more targetted approach, especially given the language in the spec "determines that starting a new block with fresh trees would be useful".
So does anyone have any ideas for new strategies, or examples of existing ones?
Thanks in advance!